GRADUATE STUDIES 203 
IN MATHEMATICS 


The Distribution 
of Prime Numbers 


Dimitris Koukoulopoulos 


2 AMERICAN 


ef Te M S MATHEMATICAL 


SOCIETY 


203 


The Distribution 
of Prime Numbers 


Dimitris Koukoulopoulos 


of,00? 
es! AMERICAN 

es: A MS MATHEMATICAL 
es 7 SOCIETY 


Providence, Rhode Island 


EDITORIAL COMMITTEE 


Daniel S. Freed (Chair) 
Bjorn Poonen 
Gigliola Staffilani 
Jeff A. Viaclovsky 


2010 Mathematics Subject Classification. Primary 11-01, 11Nxx, 11Mxx. 


For additional information and updates on this book, visit 
www.ams.org/bookpages/gsm-203 


Library of Congress Cataloging-in-Publication Data 


Names: Koukoulopoulos, Dimitris, 1984— author. 

Title: The distribution of prime numbers / Dimitris Koukoulopoulos. 

Description: Providence, Rhode Island: American Mathematical Society, [2019] | Series: Graduate 
studies in mathematics, 1065-7339; volume 203 | Includes bibliographical references and index. 

Identifiers: LCCN 2019028661 | ISBN 9781470447540 (hardcover) | ISBN 9781470454203 (ebook) 

Subjects: LCSH: Numbers, Prime. | AMS: Number theory — Instructional exposition (textbooks, 
tutorial papers, etc.). | Number theory — Multiplicative number theory. | Number theory — 
Zeta and L-functions: analytic theory. 

Classification: LCC QA246 .K68 2019 | DDC 512.7/3-dc23 

LC record available at https://lccn.loc.gov/2019028661 


Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting 
for them, are permitted to make fair use of the material, such as to copy select pages for use 
in teaching or research. Permission is granted to quote brief passages from this publication in 
reviews, provided the customary acknowledgment of the source is given. 

Republication, systematic copying, or multiple reproduction of any material in this publication 
is permitted only under license from the American Mathematical Society. Requests for permission 
to reuse portions of AMS publication content are handled by the Copyright Clearance Center. For 
more information, please visit www.ams.org/publications/pubpermissions. 

Send requests for translation rights and licensed reprints to reprint-permission@ams.org. 


© 2019 by the American Mathematical Society. All rights reserved. 
The American Mathematical Society retains all rights 
except those granted to the United States Government. 
Printed in the United States of America. 


The paper used in this book is acid-free and falls within the guidelines 
established to ensure permanence and durability. 
Visit the AMS home page at https://www.ams.org/ 


10987654321 24 23 22 21 20 19 


To Jennifer 


Contents 


Preface 
Notation 


And then there were infinitely many 


Part 1. First principles 

Chapter 1. Asymptotic estimates 

Chapter 2. Combinatorial ways to count primes 
Chapter 3. The Dirichlet convolution 


Chapter 4. Dirichlet series 


Part 2. Methods of complex and harmonic analysis 
Chapter 5. An explicit formula for counting primes 
Chapter 6. The Riemann zeta function 

Chapter 7. The Perron inversion formula 

Chapter 8. The Prime Number Theorem 

Chapter 9. Dirichlet characters 

Chapter 10. Fourier analysis on finite abelian groups 
Chapter 11. Dirichlet L-functions 


Chapter 12. The Prime Number Theorem for arithmetic 
progressions 


e& EE 


Ele] Sl @ 


BEB BEae 


aa 


vi 


Part 3. Multiplicative functions and the anatomy of integers 


Chapter 13. Primes and multiplicative functions 

Chapter 14. Evolution of sums of multiplicative functions 
Chapter 15. The distribution of multiplicative functions 
Chapter 16. Large deviations 


Part 4. Sieve methods 


Chapter 17. Twin primes 

Chapter 18. The axioms of sieve theory 

Chapter 19. The Fundamental Lemma of Sieve Theory 
Chapter 20. Applications of sieve methods 

Chapter 21. Selberg’s sieve 

Chapter 22. Sieving for zero-free regions 


Part 5. Bilinear methods 


Chapter 23. Vinogradov’s method 

Chapter 24. Ternary arithmetic progressions 

Chapter 25. Bilinear forms and the large sieve 

Chapter 26. The Bombieri-Vinogradov theorem 

Chapter 27. The least prime in an arithmetic progression 


Part 6. Local aspects of the distribution of primes 
Chapter 28. Small gaps between primes 
Chapter 29. Large gaps between primes 


Chapter 30. Irregularities in the distribution of primes 


Appendices 

Appendix A. The Riemann-Stieltjes integral 
Appendix B. The Fourier and the Mellin transforms 
Appendix C. The method of moments 

Bibliography 


Index 


Contents 


=|] Ff] 
— i> 


NM] fe =] fe] fF 
[Ne e Ne) (ee) ml 
(3 NX res 


bo 
) 
is 


ee oe) 
BEE BABE 
is) = oO} Io 


Wo} jew 
El El Bl el 
oe] -E- | ol] [IC 


iO 
Ot 
iS 


Preface 


The main goal of this book is to introduce beginning graduate students 
to analytic number theory. In addition, large parts of it are suitable for 
advanced undergraduate students with a good grasp of analytic techniques. 


Throughout, the emphasis has been put on exposing the main ideas 
rather than providing the most general results known. Any student wishing 
to do serious research in analytic number theory should broaden and deepen 
their knowledge by consulting some of the several excellent. research-level 
books on the subject. Examples include: the books of Davenport and 
of Montgomery- Vaughan for classical multiplicative number theory; 
Tenenbaum’s book for probabilistic number theory and the saddle- 
point method; the book by Iwaniec-Kowalski for the general theory of 
-functions, of modular forms and of exponential sums; Montgomery’s book 
for the harmonic analytic aspects of analytic number theory; and the 
book of Friedlander-Iwaniec for sieve methods. 


Using the book 


The book borrows the structure of Davenport’s masterpiece Multiplicative 
Number Theory with several short- to medium-length chapters. Each chap- 
ter is accompanied by various exercises. Some of them aim to exemplify 
the concepts discussed, while others are used to guide the students to self- 
discover certain more advanced topics. A star next to an exercise indicates 
that its solution requires total mastery of the material. 


The contents of the book are naturally divided into six parts as indicated 
in the |table of contents} The first two parts study elementary and classical 


complex-analytic methods. They could thus serve as the manual for an 


vil 


vili Preface 


introductory graduate course to analytic number theory. The last three 
parts of the book are devoted to the theory of sieves: Part [4] presents the 
basic elements of the theory of the small sieve, whereas Part [5]explores the 
method of bilinear sums and develops the large sieve. These techniques 
are then combined in Part [6] to study the spacing distribution of prime 
numbers and prove some of the recent spectacular results about small and 
large gaps between primes. Finally, Part [3] studies multiplicative functions 
and the anatomy of integers, and serves as a bridge between the complex- 
analytic techniques and the more elementary theory of sieves. Topics from 
it could be presented either in the end of an introductory course to analytic 
number theory (Chapter most appropriately), or in the beginning of a 
more advanced course on sieves (the most relevant material is contained in 


Chapters [14] and [15] as well as in Theorem [I6-i). 


Certain portions of the book can be used as a reference for an under- 
graduate course. More precisely, Chapters [[H8]can serve as the core of such 
a course, followed by a selection of topics from Chapters and PI 


A short guide to the main theorems of the book. Below is a list of 
the main results proven and of their prerequisites. 


Chebyshev’s and Mertens’ estimates are presented in Chapters 2] and 2] 
respectively. Their proofs rest on the material contained in Part 


The landmark Prime Number Theorem is proven in Chapter [8] Under- 
standing it requires a good grasp of all preceding chapters. 


The Stegel-Walfisz theorem, which is a uniform version of the Prime 
Number Theorem for arithmetic progressions, is presented in Chapter 
Its proof builds on all of the material preceding it. 


The Landau-Selberg-Delange method is a key tool in the study of mul- 
tiplicative functions. It is presented in Chapter Appreciating its proof 
requires a firm understanding of Chapters for the main analytic tools, 
as well as of Chapter [12] for dealing with uniformity issues. 


The foundations of probabilistic number theory are explained in Chapters 
[15] and [16] where the Erdés-Kac theorem and the Sathe-Selberg theorem are 
proven. The main prerequisites can be found in Part [I]and in Chapter 
In addition, Chapter [I3]is needed for the Sathe-Selberg theorem. 

The Fundamental Lemma of Sieve Theory is proven in Chapter [19] Its 
proof uses ideas and techniques from Part [I] and Chapters 

Vinogradov’s method, one of the foundations of modern analytic number 
theory, is presented in Chapter It builds on the material of Chapters 
and 


Acknowledgments ix 


The Hardy-Littlewood circle method is presented in Chapter It is 
used to detect additive patterns among the primes and, more specifically, to 
count ternary arithmetic progressions all of whose members are primes. 


The Bombieri- Vinogradov theorem, often called the “Generalized Rie- 
mann Hypothesis on average”, is established in Chapter 26] Understanding 
its proof requires mastery of Vinogradov’s method (Chapter [23) and of the 
large sieve (Chapter [25). 


Linnik’s theorem provides a very strong bound on the least prime in an 
arithmetic progression. It is proven in Chapter 27] and its prerequisites are 


Chapters [IH12] 20} and Bdl 


The breakthrough of Zhang-Maynard-Tao about the existence of infin- 
itely many bounded gaps between primes is presented in Chapter Its 
proof requires a firm understanding of the Fundamental Lemma of Sieve 
Theory (Chapter 19), of Selberg’s sieve (Chapter [21) and of the Bombieri- 
Vinogradov theorem (Chapter [26). 


The recent developments about large gaps between primes of Ford-Green- 
Konyagin-Tao and Maynard are presented in Chapter Understanding 
them necessitates knowledge of the same concepts as the proof of the exis- 
tence of bounded gaps between primes, with the addition of the results on 
smooth numbers presented in Chapters [14] and [16] 


Maier discovered in 1985 that the distribution of prime numbers has 
certain unexpected irregularities. His results are presented in Chapter 
and they assume knowledge of Linnik’s theorem (and of its prerequisites), as 
well as of Buchstab’s function (see Chapter [[4]and, more precisely, Theorem 


(14.4). 


Acknowledgments 


Many people have helped me greatly in many different ways in writing this 
book. 


I am indebted to Leo Goldmakher and James Maynard, with whom 
I discussed the contents of the book extensively at various stages of the 
writing process. In addition, an early version of the manuscript was used as 
a teaching reference by Wei Ho at the University of Michigan, and by Leo 
Goldmakher at Williams College. I am grateful to them and their students 
for the valuable feedback they provided. 


I am obliged to Martin Cech, Tony Haddad, Youcef Mokrani, Alexis 
Leroux-Lapierre, Joélle Matte, Kunjakanan Nath, Stelios Sachpazis, Simon 
St-Amant, Jeremie Turcotte and Peter Zenz, who patiently studied earlier 
versions of the book, catching various errors and providing many excellent 
comments. 


o Preface 


I have had very useful mathematical conversations with Sandro Bettin, 
Brian Conrey, Chantal David, Ben Green, Adam Harper, Jean Lagacé and 
K. Soundararajan on certain topics of the book; I am grateful to them for 
their astute remarks. Furthermore, I would like to thank the anonymous 
reviewers for their suggestions that helped me improve the exposition of the 
ideas in the manuscript, especially those related to the bilinear methods 
presented in Part 


I am indebted to Kevin Ford and Andrew Granville, who taught me 
analytic number theory. Their influence is evident throughout the book. 


A special thanks goes to Ina Mette, Marcia Almeida and Becky Rivard 
for guiding me through the publishing process. I would also like to thank 
Brian Bartling and Barbara Beeton for their assistance with several typeset- 
ting questions, as well as Alexis Leroux-Lapierre for his help with designing 
the figures that appear in the book. 


Last but not least, I would like to thank my wife Jennifer Crisafulli 
for her love, support and companionship. This book could not have been 
written without her and I wholeheartedly dedicate it to her. 


Funding. During the writing process, I was supported by the Natural Sci- 
ences and Engineering Research Council of Canada (Discovery Grant 2018- 
05699) and by the Fonds de recherche du Québec—Nature et technologies 
(projet de recherche en équipe—256442). Part of the book writing took 
place during my visit at the Mathematical Sciences Research Institute of 
Berkeley in the Spring of 2017 (funded by the National Science Foundation 
under Grant No. DMS-1440140), at the University of Oxford in the Spring 
of 2019 (funded by Ben Green’s Simons Investigator Grant 376201) and at 
the University of Genova in June 2019 (funded by the Istituto Nazionale di 
Alta Matematica “Francesco Severi”). I would like to thank my hosts for 
their support and hospitality. 


Notation 


Throughout the book, we make use of some standard and some less 
standard notation. We list here the most important conventions. 

The symbols N, Z, Q, R and C denote the sets of natural numbers (we 
do not include zero in N), integers, rational numbers, real numbers and 
complex numbers, respectively. Furthermore, given an integer n > 1, we 
write Z/nZ for the set of residues mod n, as well as (Z/nZ)* for the set of 
reduced residues mod n. 


We write P to indicate a probability measure, and E[X] and V[X] for 
the expectation and the variance, respectively, of a random variable X. 


Given a set of real numbers A and a parameter y, we write Ac, for the 
set of numbers a € A that are < y; similarly for Ay,, Asy, Acy. We also 
write |A| or #A for the cardinality of A, whichever is more convenient. 


The letter p always denotes a prime, and the letter n always denotes an 
integer (usually, a natural number). We write d|n to mean that d divides n, 
and that p*||n to mean that p* is the exact power of p dividing n. Lastly, 
d|n°° means that all prime factors of d appear in the factorization of n too. 


When we write (a,b), we might mean the open interval with endpoints 
a and 0, the pair of a and 0, or the greatest common divisor of the integers 
a and b. The meaning will always be clear from the context. Similarly, the 
symbol [a, b] will sometimes denote the closed interval with endpoints a and 
b, and some other times the least common multiple of the integers a and 6. 

We write Pt(n) and P~(n) to denote the largest and smallest prime 
factors of n, respectively, with the convention that P*(1) = 1 and P~(1) = 
oo. Given a parameter y and an integer n > 1, we say that n is y-smooth 
if all its prime factors are < y (ie., if PT(n) < y). The set of y-smooth 


X1 


xii Notation 


numbers is denoted by S(y). Lastly, we say that n is y-rough if all its prime 
factors are > y (ie., if P~(n) > y). Equivalently, (n, P(y)) = 1, where 
P(y) = [pcyP- 

The symbol log denotes the natural logarithm (base e). We also let 
hi(g) = i dt/logt denote the logarithmic integral. 


Given x € R, we write |x| for its integer part (defined to equal max Z<,, 
and also called the “floor” of x), [x] for the “ceiling” of x (defined to equal 
min Zs,) and {x} for the fractional part of x (defined to equal x — ||). 


Given a € R, we write |la|| to denote its distance from the nearest 
integer. On the other hand, if w is a bilinear form, then ||~|| denotes its 
norm (see Chapter 25). Finally, if 7 € C” or f : N > C is an arithmetic 
function, we write ||@||2 and ||f||2 for their ¢?-norm. 


The symbol C*(X), where X C R and k € Zso U {oo}, denotes the set 
of functions f : X — C whose first k derivatives exist and are continuous. 


We write lg to denote the indicator function of a set or of an event E. 
For example, 1j9,;; denotes the indicator function of the interval [0,1] and 
1(n,10)=1 denotes the indicator function of the event that n is coprime to 10. 
In particular, 1p will denote the indicator function of the set of primes. 


The letter s will usually denote a complex number, in which case we 
denote its real part by o and its imaginary part by t following Riemann’s 
original notation that has now become standard. In addition, non-trivial 
zeroes of the Riemann zeta function and of Dirichlet L-functions will be 
denoted by p = 6+ iy. Notice that we also use the letter y for the Euler- 
Mascheroni constant, whereas p(u) will also refer to the Dickman-de Bruijn 
function. The precise meaning of each letter will be clear from the context. 


We employ frequently the usual asymptotic notation f = O(g), f < 4g, 
fxg, f ~g and f = o(g), whose precise definition is given in Chapter [I] 


Finally, we list below some other symbols and the page of their definition: 


1p(n) xii C(s) 2) T(n) 
B(u) 150 A(x) (13) th(n) (k € N) 
e(2) 102 A(n) [37] T,(n) (Kk € C) 131 
G(x) A¥(n) p(n) q 
li(a) A’(n) xo(n) eva 
L(s, x) (97 Me vel) x(n) 
Py) Novel) (a) 
P*(n) u(n) W(x; 9, @) 
S(A,P) 1 (2x) 1) w(x, x) 
S(y) m(x3 q, a) [4] V(a,y) 
T(s) plu [152] w(n), Q(n) 


And then there were 
infinitely many 


Ever since Euclid’s proof of the infinitude of prime numbers, the distri- 
bution of these fundamental objects has fascinated mathematicians. Unlike 
other special sets of integers that have a very regular structure, such as 
the set of perfect squares, the primes do not follow any apparent pattern. 
Consequently, guessing the exact location of the nth smallest prime number 
seems to be an impossible challenge as n grows to be larger and larger [| 


Since the sequence of primes appears to be so chaotic, we can set the 
more modest goal of understanding what is the approximate location of the 
nth smallest prime, which we denote by p,. Equivalently, we seek a good 
approximation for the counting function of prime numbers 


m(x) = #{p < c}. 


Indeed, we have that 7(p,) =n, so that any approximation of 7(a) can be 
immediately translated to an approximation of p,, and vice versa. 


The study of the distribution of primes preoccupied the young Gauss. 
After examining tables of large primes, he observed that their density around 
x is about 1/log x. Translated into the language of Calculus, this means that 
a good approximation for 7() is given by the logarithmic integral 


* dt 
li = ——, 
i(x) | log t 


1fven the simpler question of deciding whether a given large integer is prime was proven to 
be a very hard challenge. It was only in 2004 that Agrawal, Kayal and Saxena constructed 
a deterministic algorithm that solves this problem in polynomial time without relying on any 
unproven conjectures. 


1 


2 And then there were infinitely many 


Using L’Ho6pital’s rule, we find that 


li(x) 


a—oo £/ log x 


b) 


so that Gauss’s guess implies that (a) is approximately equal to 2/log x 
for large x. Symbolically, we write 


(0.1) 12) 


meaning that the ratio of these two functions tends to 1 as x > co. This 
notation will be discussed in greater length in Chapter Equivalently, 
Gauss’s guess (0.1) says that p, ~ nlogn as n > oo. 


- log x (Pare) 


It took more than a century to prove Gauss’s conjecture for 7(x). The 
path to the proof was outlined by his student Riemann in his epoque mak- 
ing mémoire Uber die Anzahl der Primzahlen unter einer gegebenen Grésse 
published in 1859. In this work, Riemann explained how z(z) is intimately 
connected to analytic properties of the function 

il 
¢(s) aa d ns’ 
now called the Riemann zeta function. He then proposed a program whose 
completion would lead to a profound understanding of the distribution of 
prime numbers. In particular, it would establish the existence of a constant 
c > 0 such that 


(0.2) |r(x) —li(x)|<e/zlogr forall x>2, 


a very strong form of confirmation of Gauss’s guess. By 1895, Hadamard 
and von Mangoldt had proved rigorously all but one steps in Riemann’s 
master plan. The last step however remains elusive to this date. It is the 
famous Riemann Hypothesis that we will discuss in Chapter[8] Nevertheless, 
in 1896, Hadamard and de la Vallée Poussin proved a weak form 
of the Riemann Hypothesis that was strong enough to lead to a proof of 
Gauss’s conjecture, now called the Prime Number Theorem. 


Prime Number Theorem. As x —> 00, we have that 7(x) ~ x/ log. 


We will give the proof of this fundamental result in Chapter 


Except for the size of 7(x), there are many other interesting questions 
about prime numbers that concern the existence of various patterns among 
them. To understand such patterns, we assume a probabilistic point of view. 


Indeed, the absence of structure in the sequence of primes might lead one 
to expect that they behave as if they were random objects. Specifically, in 
1936 Cramér proposed to model the statistical properties of prime numbers 
as follows: we consider a sequence of random variables (X 1, X2, X3,...) that 


And then there were infinitely many 3 


we think of as a model of the indicator function of the primes. That is to say, 
X, models the event that a “randomly chosen” integer n is prime. Hence 
X» must be a Bernoulli random variable (i.e., it only takes the values 0 and 
1). In addition, Gauss’s guess that the density of primes around z is 1/logx 
can be interpreted to mean that the chances of a random integer n being 
prime are about 1/logn. We thus take 


(0.3) PX, =D=1/loen (> 3), 


so that P(X, = 0) = 1—1/logn, and we set for completion X; = 0 and 
X 9 = 1. Finally, since knowledge of the primality of some integer n does 
not seem to offer much information about the primality of another integer 
n', we assume that the random variables X,, are independent of each other. 


The sequence (X,)°°, is called Cramér’s model. It naturally gives rise 
to the set of “random primes” {n € N: X, = 1}. We denote its elements 
by Py < Pp <---. By construction, it easily follows that P, ~ nlogn with 
probability 1 as n — ov, that is to say, if we fix « > 0 and take n large in 
terms of ¢, then |P,, — nlogn| < enlogn with probability 1. Actually, more 
is true: the analogue of z(x) is the random variable 


I(x) = #{Pn <2} =} 0 Xn. 


Nx 


We have that 


EM(@)]=1+ >) 


eee 


which is essentially a Riemann sum of the logarithmic integral li(z). In 
fact, a consequence of Theorem [1.10] below is that |E[II(x)] — li(x)| < 10. 
Similarly, the independence of the random variables X,, implies that 


= 2 VIX Se 


N<Kx 


as x — oo. Applying the law of the iterated logarithm [117], we find that 
|M(x) —li(x)| < V/(2 + €)V[M(a)] log log(V[(x)]) 
~ \/(2 + €)x(log log x)/log x 


almost surely as x — oo, where ¢ is any fixed positive real number. Compar- 
ing this inequality with (0.2), we see that II(a), which is the random model 
of m(ax), satisfies the Riemann Hypothesis with probability 1. 


(0.4) 


If primes really do behave like a sequence of random variables such as 
(X,)°2,, then we should be able to find all sorts of patterns among them. 
For example, there should be many primes of the form 4n+1, or of the form 
n? +1. Moreover, the mutual independence of the variables X,, suggests 


that we should be able to make several integers prime simultaneously. For 


4 And then there were infinitely many 


example, there should be infinitely many n such that the integers n and 
n +2 are both primes, in which case they are called a pair of twin primes. 
Similarly, the triplet (n,2n + 1,n? + 6) should have prime coordinates in- 
finitely often. One should be careful not to take such arguments too far: 
the integers n and n+ 1 can be simultaneously prime only when n = 2, 
because at least one of them is even. More subtly, if n > 3, then we cannot 
make n and n? + 2 simultaneously prime, because n = +1 (mod 3) and thus 
n? +2 =0(mod3). In Chapter [7] we will see a way to modify Cramér’s 
model so that it takes into account such “local” (i.e., involving congruences) 
obstructions to primality. 


Despite the limitations of Cramér’s model, all indications we have so far 
support the hypothesis that primes behave as if they were random. Through- 
out this book, we will present various results that are in accordance with 
this hypothesis. Specifically, in Chapter we will prove that there are 
infinitely many primes of the form 4n +1. More generally, we will prove 
that every arithmetic progression gn +a contains infinitely many primes, as 
long as the obvious necessary condition that a and q are coprime holds. As 
a matter of fact, we will show that primes are equidistributed among these 
reduced arithmetic progressions. 


Prime Number Theorem for arithmetic progressions. Let q > 3, and 
let p(q) = #(Z/qZ)* be Euler’s totient function. If (a,q) = 1, then 


m(x;q,0) :-= #{p <2: p=a(modq)}~ ey 


ine (a — oo). 

On the other hand, it is not known to this day whether there are infinitely 
many pairs of twin primes. We do have two partial substitutes of this 
conjecture: Chen [24\/25] proved that there are infinitely many primes p 
such that p+2 is the product of at most two primes. We will prove a weaker 
version of Chen’s theorem in Chapter [18] In addition, Zhang [188] proved 
that there is some h € N such that the tuple (n,n+1,...,n+h) contains at 
least two primes for infinitely many integers n. Maynard and Tao 
improved this result by showing that for each m there is some h = h(m) such 
that the tuple (n,n +1,...,n +h) contains at least m primes for infinitely 
many n. We will present the results of Zhang-Maynard-Tao in Chapter [28] 


Substantial progress has also been made on the existence of arbitrarily 
long arithmetic progressions among primes: in 2008, Green and Tao [77| 
proved that for each k > 2, there are infinitely many integers n and d such 
that the numbers n, n +d, ..., 7+ kd are all primes. We will prove the 
case k = 2 of this result in Chapter [24] (that essentially goes back to work 
of I. M. Vinogradov). 


On the contrary, prime values of non-linear polynomials remain a mys- 
tery: there is not a single example of a univariate polynomial of degree 


And then there were infinitely many 5 


at least 2 that provably takes prime values infinitely often. However, we 
have robust methods of bounding from above the frequency with which a 
given polynomial takes prime values, as we will see in Chapter In ad- 
dition, there has been significant progress in multivariate polynomials in 
recent years, starting with the work of Friedlander and Iwaniec and of 
Heath-Brown [98], and continuing with its extensions due to Heath-Brown, 


Li, Maynard and Moroz [99}10T\[147]. 


As the above discussion shows, our knowledge about primes is rather 
sporadic, and the deeper and more complex properties of these fundamental 
objects seem to escape us despite the collective efforts of mathematicians 
since the time of Euclid. Proving that prime numbers behave pseudoran- 
domly and can be located inside interesting arithmetic sequences is one of 
the holy grails of analytic number theory. The purpose of this book is to 
present some of our best tools towards this grand goal. 


Part 1 


First principles 


Chapter 1 


Asymptotic estimates 


The functions we encounter in number theory are often irregular. It is 
then desirable to approximate them by simpler functions that are easier to 
analyze. As an example, consider the function f(x) that counts the number 
of integers in the interval [1, z] with 2 > 1. We can easily see that f is a step 
function with jumps of length 1 at all integers. This function may be written 
in terms of a more familiar function: the integer part of x, denoted by |x]. 
This is the unique integer satisfying the inequalities |x| < « < |x| +1. It 
is then clear that f(x) = |x|, whence f(x) = x + E(x) for some function 
E(x) that is bounded by 1 in absolute value. We have thus approximated 
the step function f(#) = |2| by the smooth function x, and the remainder 
term in this approximation is a bounded function. We express this via the 
asymptotic formula 


(1.1) |e] =2+O(1). 


In general, given complex-valued functions f,g and h, and a subset I of 
their domain of definition, we write 


(1.2) f(x) = g(@) + O(h(a)) (we T) 


and read “f(x) equals g(x) plus big-Oh of h(x)” if there is a constant c = 
c(f,g,) such that 


| f(a) — g(x)| <c-h(x) for each x € I. 


We will often refer to g(x) as the main term of (1.2), and to O(h()) as the 
remainder term or error term. In addition, we will often call the constant c 
absolute to mean that it does not depend on the argument of the functions 
f,g and h, nor on various other parameters that might be present. 


8 


1. Asymptotic estimates 9 


Notice that in the difference x — |] is the fractional part of z, 
denoted by {x}. However, it turns out that it is often simpler to ignore the 
exact value of the remainder term and to only keep track of the fact that it 
is a bounded function. Suppose, for example, that we want to approximate 
the expression )7,<, |nV/2|. Applying to each of the |2| summands, 
we find that 


(1.3) S [nv2] = So (nv24+.0(1)) = V2S5 > n+ O(a), 


Nx Nx N<x 


since the total error is the sum of |x| functions of size O(1). On the other 
hand, we know that 1+2+---+ N = N(N +1)/2. Applying this with 
N = |x| =x +O(1), we conclude that 


_ 7 (x + O(1)) - (x + O(1)) _ V2 
S_[nv2| = v2: ; + O(«) = ae + O(2) 


NKxr 


for x > 1, since O(x) + O(x) + O(1) = O(a) when x > 1. Indeed, the 
notation O(x) + O(x) + O(1) denotes a sum f(x) + fo(x) + f3(x) for which 
there are absolute constants c1,c2,c3 > 0 such that | f;(x)| < cjx for j = 1,2 
and | f3(x)| < c3. Hence, | fi(x) + fo(x) + fa(x)| < (c1 +. co + ¢3)@ for x > 1. 


Remark 1.1. As we see in the above example, the power of the asymptotic 
notation is that it allows us to turn inequalities into equalities and it is 
thus amenable to algebraic manipulations. Beware though that the rules of 
addition and multiplication change when we use asymptotic notation. For 
example, O(1)+O(1) = O(1), since the sum of two bounded functions is also 
bounded. Similarly, we have O(1)-O(1) = O(1) and O(1)—O(1) = O(1). On 
the other hand, if we sum an unbounded number of bounded functions (as 
in (1.3)), the error term must reflect this by growing linearly in the number 
of summands. 


The asymptotic notation also allows us to compare the order of magni- 
tude of different functions: if 


(1.4) f(x) =O(g(@)) (wet), 


we say that “f has smaller or equal order of magnitude than g in J”. Often, 
we express this relation using Vinogradov’s notation 


f(x)<g(e) (we), 
which has the exact same meaning as (1.4). 
If f(x) < g(x) and g(x) < f(x) for x € I, we write 


f(x) <g@) (wel) 


and we say that “f and g have the same order of magnitude in I”. 


10 1. Asymptotic estimates 


Remark 1.2. The range J in which we compare the functions is important. 
For instance, /z < x when x > 1, but x < /z when z € (0, 1]. 


Remark 1.3. Sometimes, the functions f and g we are comparing depend 
on various parameters. It is then possible that the implied constant in the 
estimate f(x) < g(x) depends on these parameters. If so, we will indicate 
this dependence by a subscript. For instance, for each fixed ¢ > 0, we have 


log a <p." ee 1). 


There are two more related definitions of asymptotic notation that will 
be important throughout this book and they concern the limiting behavior 
of functions. We write 


fla) ~ g(a) (ero) fim HE, 


where xo € R=Ru {—oo, +00} and g is non-zero in an open neighborhood 
of x9. Under the same assumptions, we also introduce the notation 


= 0(g(x Cy x im He) 
fle) =o(9(2)) (wm) => tim =0 


or, for brevity, 
f(@) = Orsay (g(z)) <> lim ~~ =0. 


Notice that if f(x) = o(g(x)) as x + xo, then f(x) has genuinely smaller 
order of magnitude than g(x) in the vicinity of zo. 


We give below some examples to illustrate the use of the above asymp- 
totic notation. 


Example 1.4. Often we have a composite expression that we want to eval- 
uate asymptotically, such as log|a|. Since |x| = x + O(1), the Mean Value 
Theorem implies that 


log |x| = logx + O(1)- : 


for some c between || and x. Thus 


log |a| =logx+O(1/x) (#21). 


Example 1.5. A simple application of the Mean Value Theorem is some- 
times not sufficient because we need more precision in our approximation. 
We may then employ Taylor’s theorem. For example, we have 


—__—_ logxz log? x log? x 
= 2 1). 
Jartloga = Vrt JE Bas? o( aap) ) (x > 1) 


Summation by parts 11 


Example 1.6. The asymptotic notation can also be used to obtain as- 
ymptotic expansions of integrals that cannot be computed in terms of el- 
ementary functions. As an example, we analyze the logarithmic integral 
itz) = ie. dy/ logy. We integrate by parts repeatedly to find that 


Ae rene 7 [ ve) 


x eT 
= — +O(1 +f d 
log x (1) 2 log? y Y 


x * dy 
= +O0(1 +2 f 
logz log? x (1) 2 logy 
Me ns (N — 1)la dy 


FOw() +! fh. 
(1) 2 logN +1 y 


x as x — co by L’Hopital’s rule, so we arrive 


~ Joga | log? x log’ x 


The last integral is ~ x/logNt! 


at the asymptotic formula 


ia) = — ig ae (N-1)!s ox(, x 


~ Joga log? « - log x)N+1 


"log? a “a log™ «x 
for x > 2. 


Summation by parts 


Many theorems of analytic number theory can be phrased as asymptotic 
estimates for the summatory function of a sequence (d,)°2, of complex 
numbers. This is the function 


A(x) = Ss" Gn, 
NnKx 


where x € Rs. For instance, if an = 1p(n) (the indicator function of prime 
numbers), then its summatory function A(x) is the counting function of 
prime numbers 7(2). 


The simplest case is when a, = f(n), with f € C!([0,+00)). If f does 
not vary too rapidly, it is reasonable to expect that 


f(n) ~ | f(t)dt, whence Sf) ~ | f (t)dt. 
n-1 n<x 0 
To make the above heuristic rigorous, we examine how close f(n) is to 


{ , f(t)dt. We begin by writing 


s(n) - f i, (oat = | “(fla — f(t))at. 


12 1. Asymptotic estimates 


For any constant c, we have dt = d(t — c). Integrating by parts yields that 


n 


fla) fo F(b)at=—(n=1- (Fn) = ln) + fF Het 


n-1 
If we take c= n — 1, the “side term” (n — 1—c)(f(n— 1) — f(n)) vanishes. 
Since we also have that t— (n—1) = {t} for t € [n—1,n), we conclude that 


fin) = f° sears [fete 


Summing the above identity formula over n = M+1,M+2,...,N, where 
M < WN are two integers, we arrive at the Euler-Maclaurin summation 
formula: 


N N 
(1.5) S sey= fo sroaee f rotger 


M<n<N 


The first integral is the expected main term and the second term will be 
smaller if f’ is of smaller order of magnitude than f, which is a quantitative 
way of saying that f does not vary too rapidly. 


We demonstrate the versatility of the Euler-Mclaurin summation for- 
mula with a few examples. When f(t) = t?, we have 


He se(eat = [ 2e{that < i. dt = N?. 


Consequently, 


N N N N3 
a. = i. dt + | 2t{t\dt = = t O(N’). 
n=1 0 0 


This should be compared with the well-known exact formula ae n= 


N(N+1)(2N+1)/6. However, it would be rather hard to guess such an exact 
formula for the sum yy n'0 (though, see Exercise [1.3). Nevertheless, 
adapting the above argument implies readily that 


v 100 yee 100 
= O(N). 
a noe 


A good exercise on the Euler-Maclaurin summation formula is to check that 
aes 
1.6 —=logN+O(1)~logN (No 
(1.6) d — = log (1) ~logN ( F 
which is an estimate on the rate of divergence of the harmonic series. A 
more precise formula will be proven in Theorem [1.11] below. 
The Euler-Maclaurin summation formula is a special case of a general 
identity. To prove this generalization, we introduce a tool called summation 


Summation by parts 13 


by parts or partial summation, which is a discrete analogue of integration 
by parts. As it will become clear throughout this book, partial summation 
is one of the main workhorses of analytic number theory. It allows us to 
pass from estimates on A(z) = )7,,<,4n to estimates for general sums of 
the form Jy en<z nf (n) with f a continuously differentiable function. The 
case when a, = 1 for all n, for which A(x) = |x| = x + O(1), corresponds 
to the Euler-Maclaurin formula. 


To make the passage from A(z) to eee anf (n), we use the theory 
of Riemann-Stieltjes integration (see Appendix[A): note that A(z) is a step 
function that is continuous from the right and that has jumps of length ay, 
at each integer n. For any continuous f, Theorem [A.1(f) implies that 


(17) Gn, | (1 =f f(t)d A(t 

2 
where the right-hand side is a Riemann-Stieltjes integral. If we further as- 
sume that f is continuously differentiable and integrate by parts (see The- 
orem[A.1{d)), we arrive at the formula 


(1.8) Ye anf(n) = ASO. 


t= 
Y<ngz - 


— [ A(t) f’(t)dt. 
y 


Remark 1.7. A more elementary way to prove (1.8) that avoids the use of 
Riemann-Stieltjes integrals is to use Abel’s summation formula: if (ap,)°%° 
and (b,,)°°., are two sequences of complex numbers, then 


n=1 


N 
(1.9) > anbn = A noosa] : a, Anne Bp) 
n=M+1 7 n=M+1 


for all integers M > N > 1, where A, = A(n) = )77_, aj. Indeed, this can 
be proven by noticing that 


N N N N-1 
> andn = s (An = Api )Dy = a Anbn — S Anbn41. 
n=M+1 n=M+1 n=M+1 n=M 


In the special case when b,, = f(n), we have that 


n+1 
An(bn41 — bn) = | A(a) f’(x)dx 


because A(x) = A, when x € [n,n +1). Together with (L.9), this leads us 
to (1.8). We leave the details as an exercise. 


Example 1.8. For reasons we will explain later, we often count primes with 
a logarithmic weight. To this end, we define Chebyshev’s theta function 


2)1= SS log p. 


pKu 


14 1. Asymptotic estimates 


We may use relation (1.8) to go back and forth between 7(x) and 6(x): we 


have 
(x) = | (logy)an(y) = a(x) loge — f Dy 


1 
Similarly, using that m(%) = )'o_-cp<x 1 for each € > 0, we have 


- a O(a) [ A(y) 
- d6(y) = dy. 
ee a logy (y) logt | Jy ylog?y © 


Note that we replaced 2~ by 2 in the rightmost integral, which is justified 
by the fact ‘ie —_— Ns f for any function f that is Riemann-integrable on 
the interval [a, 6]. 


Often, we have at our disposal an asymptotic formula of the form 
(1.10) A(z) = M(x) + R(x) for all x > xo, 


where M(x) is a continuously differentiable main term that approximates 
A(x), and R(x) is the remainder term to this approximation. For instance, 
when a, = 1 for all n, we have A(x) = |x| = x — {x}. Another important 
example is when a, is the indicator function of the primes for which A(a) = 
m(x). We then write 


(eid) n(x) =li(x) + R(z), 


with the Prime Number Theorem being equivalent to the estimate R(x) = 
o(a/loga) as x — oo, and with the Riemann Hypothesis yielding the much 
stronger estimate R(x) = O(,/x log x) (see (Q.2)). 


For z > y = 29, relations (1.7) and (1.10) imply that 


YS anf(n) = fo Far + RE) 
y 
=f seommars f roare) 
y y 


(1.12) 2 [ somoas ros), 
y 


=y 


- [ Ros oate 
y 
where we successively applied parts (b), (e) and (d) of Theorem [A.1] 


Example 1.9. Write 7(x) = li(xz)+ R(x) as in (LL). Applying (L.12) with 
Gn = 1p(n), f(n) =logn, z = a and y = 1, we find that 
” R(t 
6(z) =x-1+4+ R(x) loga — i: AO ae 
1 
If for large t the remainder R(t) is much smaller than li(t) ~ t/logt, we see 


that a good approximation to @(x) is given by x (see Exercise [L.7). 


Summation by parts 15 


Recall that A(x) = |2| when a, = 1 for all n. Writing A(x) = x — {z}, 
we find that (1.12) implies the following generalization of (1.5). 


Theorem 1.10 (Euler-Maclaurin summation formula). If f € C'([y, z]), 


_ S> Fn =f f(tat — ft} F(0) “th {t}s"(t 


Y<nXz 
In particular, if f € C\([1,+00)), then for every x > 
So s(n) = a+ fo p()at — {x} flew w, {eh f"(t) 


We give two important applications of the Euler-Maclaurin formula. We 
start with an estimate of the growth of the harmonic series that sharpens 
the estimate in (16). The symbol y in its statement denotes the Euler- 
Mascheroni constant that is defined by 


[o-e) 
t 
(1.13) y= 1- f at = 0.57721... 
1 
Theorem 1.11. For x > 1, we have 


» = logr +7 +0(-). 


Proof. By Theorem [L.10] we have that 


ae -{s dt, i [ Oat 


nee 


Since 0 < {t}/t? < 1/t?, the integral [;°{t}t~?dt converges absolutely. In 
addition, i dt/t = log x. We may thus write 


se =logr + (1 i tat) o) + i ae 


Nx 


SiR 


Moreover, we have the inequalities 


1 ad a 1 
se et and o< | Tat < | a2 
x x zx 


x ? oe 


which complete the proof of the theorem. 


A more involved application of Theorem [1.10] is given in Stirling’s ap- 
proximation for the factorial function. 


Theorem 1.12 (Stirling’s formula). Forn € N, we have 


aS (=)" Vaan (1 + O(1/n)). 


16 1. Asymptotic estimates 


Proof. Taking logarithms and applying the Euler-Maclaurin formula, we 
find that 


n n n ff 
log(n!) = Slog i = f loge dx + log — {n}logn + [ ar 


j=l 
=ntogn—n+i+ f° Bar 
since n € N here and thus {n} = 0. Next, set 
F(z)= [as — 1/2)dt. 


Since {t} — 1/2 is a 1-periodic function of mean 0 over a complete period, 
we find that F is also 1-periodic. In particular, F(n) = 0 for all n € N, and 
F(x) = O(1) for all x > 1. Integration by parts implies that 


[a= 28s [ {i} 1/2 4, _ logn F(t) \r | FQ) 
1 t 2 1 t t=1 1 es 


2 t 
_ logn 2 Ce) 
=a f FWay 


(Justify why we can integrate by parts even though F is not differentiable 
everywhere.) The integral [>° F'(t)t~?dt converges absolutely by the esti- 
mate F(t) = O(1) and its tails satisfy the bound 


© F(t) © | F(t)| dt 1 
[ ? ail < f ? ar | eon 


This proves that 


© F(t) 

log(n!) = (n+ 1/2) logn—n+c+O(1/n), where c=1+ az dt. 
1 

Since e°(/") — 14 O(1/n) by Taylor’s theorem, the proof will be complete 

provided that we can show that e° = 27. Establishing this identity requires 

different means outlined in Exercise [L.1]] below. Alternatively, see Theorem 


(1.13) 


The saddle-point method 


One of the most useful methods for obtaining asymptotic evaluations of in- 
tegrals is the saddle-point method|] In its simplest form that we present here 
it is also called Laplace’s method and it is used to evaluate asymptotically 
integrals of the form ii e/, where f : [a,b] + R is a function. In practice, 
f depends on some parameters and our goal is to estimate is ef in terms of 
these parameters. 


1Other names for it are “method of the steepest descent” and “stationary-phase method”. 


The saddle-point method 17 


If f has a unique maximum in [a,b], say at c, then we may expect that 
most of the mass of the integral i ef() dx comes from values of x around 
c. If c€ (a,b) and f is a smooth function, then c is a stationary point of f, 
that is to say, f’(c) = 0. Moreover, if f”(c) does not vanish, then it must 
be negative by the maximality of f(c). Using quadratic approximation, we 
find that 


plo) = plo) — Ea — 2, 


so that we might expect that 
b 
(1.14) | of@ dn ws / efl-IF"(ON(e—e)?/2 gy. 


The integrand on the right-hand side of the above formula decays fast when 
x moves away from c. Hence, it seems reasonable to expect that 


b ioe) 
/ of@an x | ef -IF"(Ol@-c)?/24y = ef © /In FO), 


where we used the identity fe e/2dy = Vn. 

There are various subtleties and technicalities that we left out of the 
above discussion. Rather than trying to prove an abstract and general the- 
orem that establishes rigorously the above formula, we demonstrate how to 
handle all the necessary details in a concrete example. 


Our goal is to study the asymptotic behavior of Euler’s Gamma function 
that is defined for Re(s) > 0 by the formula 


r= f e "a? 1 de. 
0 


This function extends the usual factorial function. Indeed, noticing that 
e * = (—e *) and integrating by parts, we deduce the functional equation 


of the Gamma function 
(1.15) I'(s+1) = sI(s), 


valid whenever Re(s) > 0. Iterating this formula implies that [(n +1) =n! 
for all n € Zso. Moreover, (£15) can be used to meromorphically continue 
I to the entire complex plane: applying it n + 1 times, we deduce that 


I(is+n+1) 
(s+1):--(st+n)’ 


(1.16) I(s) = < 


which can be taken as the definition of [ for Re(s) > —n — 1. It is clear 


18 1. Asymptotic estimates 


from this formula that the only singularities of [ are simple poles at 0, —1, 
—2,... with res,-_,I'(s) = (-1)"/nl! 2 

Let us now use the saddle-point method to estimate ['(s) when s > 1. 
We note that 

T 1 Lf 
(1.17) Ls)= Loa!) = “| ear, where f(r) =—a+sloge. 
s S Jo 

We have f’(z) = —1+8/z and f"(x) = —s/x?. In particular, the function 
f has a unique maximum in the positive reals at « = s. If we can show that 
the integral in (1.17) is dominated by values of x very close to s and carry 
out the argument leading to (L.14), we will deduce that 


I'(s) ~ (s/e)*V 27/s, 


which is a generalization of Stirling’s formula for the factorial function. 


In order to establish the above formula rigorously, we begin by showing 
that we may truncate the range of integration in (1.17) to values of x that 
are very close to s. This is done in two stages. 


First, we show we can discard the portion of the integral over [2s, +00) 
at the cost of a small error term. Indeed, for each x > 2s, we have f/(x) < 
—1/2. Hence, the Mean Value Theorem implies that f(x) < f(2s) — (a — 
2s) /2. Consequently, 


| * fda < 2ef(2s) = ef(e)+(log2-1)s 
2s 


and thus 


1 2s 
I(s) = of ef dx + O(efs)-8/4), 


Next, we show we can discard the portion of the integral over FE := {x € 
(0, 2s] : |j2—s| > s?/3}. Indeed, for all 2 € E, Taylor’s theorem implies there 
is some c € (0,2s] such that f(x) = f(s) + (@ — s)f'(s) + (2 — s)?f"(c)/2. 
We have f’(c) = —s/c? < —1/(4s) and (x — s)? > s4/. Therefore, 


2s 
[eters [of ae = asel-#8, 
E 0 


In conclusion, we have the asymptotic formula 


(1.18) T(s) = =| ef@dr ae O(ef(s)—s0/8/10) 
5 J \|2—s|<s2/3 


?Recall that a function f that is analytic in the punctured disk {z € C:0 < |z—al <r} 
has a Laurent series expansion f(z) = 0,67 ¢n(z— a)” about a. Its residue at a is defined to be 
c—1 and is denoted by resz=a f(z). If there is an integer m > 1 such that c_m 4 0 and cn = 0 for 
n <—m, then we say that f has a pole of order m at a. The pole is called simple when m = 1. 
A simple way to check whether f has a simple pole at a is to compute limz-+a(z — a) f(z). If this 
limit exists and is non-zero, then f has a simple pole at z = a of residue c_, = limz-4a(z—a) f(z). 


The saddle-point method 19 


Consider now the portion of the integral with |” — s| < s?/3. For such 
x, we have f”"(x) = 2s/x? = O(1/s) and thus 


f(x) = slog(s/e) (w= s)? | o( #5") 


2s S 


by Taylor’s theorem. Consequently, 


ef (@) = (g/¢)%e~ (@-8)"/28+ O(|e—sl°/s?) 
= (s/e)8e @-9)"/8(1 + O(|x — s|°/s”)), 


where we used the formula eo = 1+ O(d) for bounded values of 6, a con- 
sequence of the Mean Value Theorem. We make the change of variables 
x=s+yy/s to find that 


| fdr = (s/e)*V/s eV? + O(lyl®/Vs))dy. 
|z—s|<s?/3 lyl<s1/6 


Since fp eV /2dy = Vn and Ts ly|3e-¥"/2dy < co, we conclude that 
(1.19) / ef@dx = (s/e)* V8(V2n — R+ O(1/V8)), 
|a—s|<s?/3 


where 


he / et dy < / e l9l/2dy = de 3/°/2, 
\y|>st/6 \y|>st/6 


Together with (LI8), the above estimates imply that 


(1.20) I'(s) = (s/e)*/2n/s (1 + O(1/v/s)) (e223), 
thus generalizing Theorem [1.12] only with a weaker error term. 


The above formula can be further extended to complex values of s and 
the error term can be improved. 


Theorem 1.13 (Stirling’s formula II). Fir 6 > 0. Uniformly for s € C with 
|s| > 1 and |arg(s)| < 7— 46, we have that 


I'(s) = (s/e)°V2n/s (1 + O(1/|s|)). 
Proof. For general complex values of s, the function f(x) = —x + slogx 
does not have a stationary point on the semiline Rs 9. One approach to 
proving the theorem is to employ Cauchy’s residue theorem to write 


Is) = - | e *z°dz, 
ie 


s 


where L is the semiline { z € C: z = As, A > 0} traversed from 0 to co. The 
new contour contains the stationary point z = s and we could use the ideas 
leading to to estimate ['(s). This is rather complicated in practice 
(and an excellent exercise). Instead, we use a trick. 


20 1. Asymptotic estimates 


We begin by noticing that 


(1.21) I(s) = Jim oa 


We will use the method of proof of (1.20) to show that 


(1.22) Tistn+1l)rve?nV2an (n- oo). 


For the purposes of proving , 8 is considered fixed. We then have 


Tin+s+l)= : e *g2t "dg + o( | e~*a7 "da 
|a—n|<n2/3 |a—n|>n2/3 


= / aap e a5 da + Onso0(n7(n/e)"V/n), 
r2—n|<n 


where we bounded the error term using a variant of (1.18) with n+o in place 
of s. For the main term, we note that 2° ~ n* when |x — n| < n?/3. On the 
other hand, we may estimate the integral of e~*x” over x € [n —n?2/3, n+n?/} 


using (1.19) (with n in place of s). This proves (1.22). 
Now, combining (1.21) and (1.22), we arrive at the formula 


* Soa s+7)). 


We employ the Euler-Maclaurin formula to estimate the sum over j: 


{th ay 
0 Sst+z 


(1.23) logI'(s) = lim (s +n)logn—-—n+ po 


S “log(s +j)= / log(s + x)da + 
=i 


log(s + n) — log s 


= (s+n)log(s+n)—slogs—n+ 5 


’ —1/2 
mgr 
0 S+2x 
If we set F(x = fo ({z}—1/2)dx < 1 and integrate by parts as in the proof 
of ven | we find that 


log(s + n) — logs 
2 


Yo s+ Jj) =(s+n)log(s+n)— slogs—n4 


+f Gage 


Inserting the above formula into (1.23) yields that 


+ weer) F(a) ia 
0 


log I'(s) = slogs—s 5 eae 


Exercises 21 


It remains to show that the integral on the right-hand side of the above 
equality is < 1/|s|. Indeed, if s = 0 + it with o > 0, then |x+]| < |z|+|s], 


so that F(x) ; 
2 x a 1 
——— dar = [ — dr «x —. 
I (s+a)? 0 («+\sl)? |s| 


Finally, if s =o + it with o < 0, then our assumption that | arg(s)| < 7-06 
implies that |o| <g |t| and thus |s| = |¢| + |o| =o |t|. Hence 


[ F(a) 7 [ da 7 7 da im de J 
———dx ———.—. < cio 
o (sta)? 0 (t—|ol)? +? “Jo Sais) (2/2)? ~~ Is| 


which completes the proof. 


We conclude our discussion on the Gamma function by proving that it 
can be represented by an infinite product. (See Exercise[I.14]for the rigorous 
definition of convergence of infinite products.) 


Theorem 1.14. For all s € C, we have that 


in Gtiny ©. 2 
T(s) = = : 
(s) alt 1+s/n 8 lita 


In particular, T does not have any zeroes. 


Proof. By (1.21) and (1.22) with N in place of n, we have that 
N s 
N!N®S 1 i 
I(s) = lim - lim |] " (“+ ie 


~ Nesoo 8(8 + 1)---(8 +N) sN-co td sf n 


and the first equality follows. Finally, the second equality follows by noticing 
that ira +1/n)=(N+1)~e7 th e!/" by Theorem [11 


Exercises 


Exercise 1.1. Consider the following functions: 


file) =a see, f(x) = eV™8*,  fa(z) = (logz)*, fa(z) = Vz, 
x a 


fs(v) =e", fe(a) = (ea fr(a) = ord fs(x) = log log a, 


where A is a fixed positive real number. Order the functions in terms of their 
order of magnitude as x — ovo, namely find a permutation 0 € Sg such that 
fot1y(2) < focay(z) K +++ < forgy(z) when x > ov. 

Exercise 1.2. Show the following asymptotic estimates: 

(a) log(1 + 6) = 6 + O(6?) for 6 € [-1/2, 1/2]. 

(b) /v+1=/r+O(1//2) for x > 1. 

(c) e§ =1+O(6) for |6| < 1. 


22 1. Asymptotic estimates 


(d) Ifp > 1, then >... 1/n? <p x1? for all x 2 1. 
e) Let p € (0,1) and consider a sequence (a,)°°_, such that 0 < an41 < pan for 
n=1 
all n > 1. Then 


Exercise 1.3. Show that there is a polynomial P;, of degree k + 1 and of leading 
coefficient 1/(k + 1) such that 


N 
Sink =P,(N) forall NEN. 
n=1 


[Hint: Use Abel’s summation formula (9) and induction on k.] 


Exercise 1.4. 
(a) Prove that 
— 2 
So vn= a +O(/z) (#21). 
nku 


(b) Prove that there is some constant c such that 
1 1 
= =2 (—) > 1). 
rr; Va+e+O Te (x >1) 


Exercise 1.5. 
(a) If f : R51 > Rso is decreasing, then show that 
/ f(t)dt < S> fin) < f(t)dt (a >1). 
[w|+1 n>x La] 


(b) Prove that 


1 1 1 
—< ) < > 
ia a 5(x _ 15 (6 >0, © > 2) 


n>x 
and , 
logz< \)\—<141 Si, 
og x S- : + logax (a > 1) 


nXKx 


Exercise 1.6. A number n is called square-full if p?|n for all primes p|n. 


(a) Show that n is square-full if and only if it can be written as n = a?b® for some 


integers a, b. 
(b) Prove that #{n < x: n is square-full} = a for x > 1. 
Exercise 1.7. Define Chebyshev’s psi function 
d(x) = S> logp. 
p <a 
(a) Prove that |~(x) — 6(x)| < /xloga for all x > 1. 


(b) Prove that the asymptotic relations 7(a) ~ x/logx, 0(%) ~ x and (a4) ~ x 
are equivalent as x — oo. 


Exercises 23 


(c) Prove that the asymptotic estimates 7(x) = li(x) + O(,/zlog x), A(x) = a+ 
O(V/x log? x) and y)(2) = x +O(./2 log? x) are equivalent in the range x € Rs. 

(d) Fix c > 0. Prove that the estimates m(r) = li(x) + O(xe~¢°v8*), O(x) = 
z+ O(re~°V!°8* log x) and w(x) = x + O(xe~°V'°8* log x) are equivalent in the 
range © € Ryo. 


Exercise 1.8. Let (a,)°, be a sequence of complex numbers, and define 


1 1 n 
=- y GQ, and L(x) = a 
x log x n 


nKx nxn 


to be its mean value and its logarithmic mean value, respectively. 
(a) If lim, ... M(x) = é, then show that lim,_,., L(x) = @ as well. 


(b*) Construct a sequence of a, € [0,1] for which L(x) tends to a limit as x > ov, 
whereas M (a) does not. 


Exercise 1.9. Here we study the asymptotic behavior of a Poisson distribution of 
parameter \ > co. Throughout we fix « > 0 and c> 2. 


(a) Recall Exercise [L.2/e), and let Q(u) = ulogu—u-+1. Show that: 


—A\n —Q(u)dr 
. e*r e : 
(i) s 7 ee x iflt+te<cucc; 


eo >)” eT O(uyr 


- o feeqcing 
(ii) > 7 oh ife<u<l-e 


O<n<ur 


(b) For fixed a < 6 and A > ov, show that 


—A\n B 
3 eM =| Pa. 
n! V2T Ja 


AtaVxX<n<A+BVX 


[Hint: First, prove the estimate e~*\"/n! ~ (2nA)~V/2e-(-A)"/2% when n = 
A+ O(VX) and \ > o0.] 


Exercise 1.10. This exercise generalizes the proof of Theorem [L.12] 

We define the sequence of the Bernoulli polynomials B,,(a) and of the Bernoulli 
numbers B,, as follows: we let Bo(x) = Bp = 1, By = —1/2 and Bie )=a4+B,. 
We then let Bo(a2) = Bo + 21 B,(x)dz, where Bz is such that So Bo(x)dxz = 0, 
that is to say, By = 1/6 and Ba(x) = ie —«x+1/6. In cava assuming we have 
defined B,,(x : we let Bnyi(z) = Bnyit+(n+1) fp Bn(t)dt, where By +1 is such 
that f, B nti(z)dar = 0. 

(a) ae n # 1, show that B,(1) = B,(0) = Bn. Conclude that the function « > 
B,,({x}) is 1-periodic and continuous. In addition, show that pias n({t})dt = 
(Bn4i({x}) — Bn4i)/(n +1) for alln > landzeR. 


24 1. Asymptotic estimates 


(b) Given integers a < b and k > 1, and a smooth function f, prove that 
~ fla)dr + yo FD) — f(a) 
yee f° BLD, 


2! 


x. 


(c) Let m € Z and k € N. Show that 
k! 
By( Ne ee a lm 
[ a _— #0 (Qrim)k 
[Hint: Use part (b) when m # : Conclude that 


e2Tima 


By({x}) = (ani) ae for k > 2 


(d) For k > 1, show that Bo,,1 = 0 and 
—1)*-1(2k)! 1 —1)F-1(2k)!C (2k 
Dy = DEW! ye L_ CYEHeRICeR) 


~ 92k—17-2k m2k 92k—1772k 
m2>1 

(e) Show that B, (x) = 7p (7) By_,x* for n > 0, and deduce that B,(x% +1) = 
B,(«) +nx"—! for n> 1. 

(f) Prove the recursion formula B, = —(n +1)! ns CO Eee forn >1 
and deduce from it that |B,| < (4/5)"n! 

(g) Consider the generating series F(z, x) = >> 9 Bn(x)z"/n!. Prove that OF /Ox 
= 2F, as well as that F'(z,1)—F(z,0) = z. Deduce that F(z,x) = e**z/(e*—1). 

(h) Show that F'(z,0) + 2/2 = z/(e* —1)+z/2 is an even function and give a new 
proof that Bo,i, = 0 for n > 1. 

(i) Noticing that z/(e? — 1) = 1/(1+ 0°, 2”/(n + 1)!), give an explicit formula 
for By. 


Exercise 1.11% (a) For each n € Zyo, let I, := 7/2 (cos x)"dax. Show that 
nm 1-3++-(2k—1) 2-4.--(2k) 

‘ I = —____.. 
a re and eke 7a Oe T) 


(b) Show that [,41 ~ In as n > oo. [Hint: Show that most of the mass of the 
integral defining J,, is concentrated around x = 0.] 
(c) If c is the constant from the proof of Theorem [1.12] show that e° = V2r. 


(d) Use the saddle-point method to prove that I, ~ \/7/(2n). 


Iop = 


Exercise 1.12. Fix 5 > 0, and let s € C with |s| > 1 and | arg(s)| < 7-0. 
(a) If s =o + it, prove that 
IT(s)| <5 lea oe arele) 


In particular, if |c| < C and |¢| > 1, then |I'(s)| Xo |t/?—/2e7 "41/2, 


Exercises 25 


(b) Show that 
(I’/T)(s) = log s — 1/(2s) + O(1/|s|?). 
[Hint: If ie ‘= log (TP (s Mef9)" ae and « is small enough in terms of 6, 
then f’(s) = (27i)7 * fila wela] 2 Sf (8 + 2)d2 Ke,s 1/|s |?.] 


Exercise 1.13. For all s € C, show the reflection and the duplication formula of 
the Gamma function: 


I(s)T(1—s) = and I(2s) = 1~1/22?s-117(s)T(s + 1/2). 


T 
sin(7s) 
[Hint: Show that T'(s)['(1 — s)sin(7s) and 4*T(s)'(s + 1/2)/T'(2s) are entire, 1- 
periodic and bounded functions.] 

Exercise 1.14. Given a sequence (a,)°@, C C, consider the infinite product 
[1o2.,(1 + an) and the partial products Py,y = [T]_,;(1 + an). We then define 
the following notions: 


° et +n) diverges to zero if limy-40 Pu,n = 0 for all M € Zs1. 


e []7,(1+4,) converges (conditionally) if there is some p 4 0 and some 
M € Zs, such that limy_,.. Pu,n =p. We then define [[*~_, (1+ an) = 


pT (1+4n) and say that []><_,(1+a,) converges to rls 7 (1+4n). 
e [[,(1+ an) converges absolutely if [[7~, (1+ |a,|) converges. 

(a) Check that the definition of conditional convergence of []7-_,(1+¢@n) does not 
depend on the choice of M. [Hint: Verify first that if [],,,(1 + @n) converges, 
then 1+ a, 4 0 for all sufficiently large n.] 

(b) Assume that []>-_,(1+an) converges. Show that []>~_,(1+a,) = 0 if and only 
if there is some n such that 1+ a, = 0. 


(c) Show that if []7°_,(1+4,) converges, then lim,_,. an = 0. 

(d) Show that []>°,(1 + an) converges if and only if for each ¢ > 0 there is an 
integer N = N(e) such that |Py,,n, —1| <¢ for No >M, SN. 
[Hint: Use Cauchy’s criterion for the convergence of sequences.] 

(e) Show that if [[7°_, (1+an) converges absolutely, then it also converse condition- 
ally. [Hint: Expand ers (1+a,)—1 and compare it with [2 vy, (+lan|)-1.] 

(f) Show that []>-_,(1 + a,) converges absolutely if and only if so does the series 
So, an. [Hint: In both cases limps Gn = 0, whence |an|/2 < log(1+]an|) < 
lan| for Tage n.] 


(g) Show that [[>-_,(1 + ,) converges if and only if there is some N € Zs, such 
that |a,| <1 a n> N and the series }>~_,, log(1 + a,) converges, where es 
denotes the principal branch of the logarithm (i.e., logz € R for x > 0), a 
follows: 

1) Note that if either []7°.,(1+ an) or ae 14m converges, the sequence 
(Gn)°2, converges to 0. Conclude that in either case there is No € Zyy 
such that Re(1+ a,) > 0 and |ja,| <1 for n > No. 

2) Set Sn... = ear log(1+a,) for Ng > Ni, > No, where No is as above. 
Show that there are integers ky,,n, such that log Py,.n, = Sny,n. + 
2Ttky, No for No S Ny = No. 


26 1. Asymptotic estimates 


3) Assume that limp... dn = 0. Show that there is some Nj € Zn, such 
that |Sn,.v. — Sn,,no41| < 7 and |log Py, ny, — log Py,.no4i] < @ for 
No > N, > Nj. Use induction on N2 to conclude that ky,,n, = 0 for 
No > Ni > Ni. 
4) Prove the equivalence stated in part (g). 
(h) Show that if °°°_, |an|? < co, then the product []*~_,(1+,,) converges if and 
only if the series }>>°_, an converges. 
(i) Assume that a, 4 —1 for all n. Is it possible that []°~_,(1 + a,) diverges and 
1 Gn converges? Is it possible that [[7-_ (1+ an) converges and >>, an 
diverges? 


Chapter 2 


Combinatorial ways to 
count primes 


Perhaps the oldest way of counting primes is the sieve of Eratosthenes. 
Named after the ancient Greek mathematician Eratosthenes of Cyrene, it 
is an algorithm that determines all primes up to a given threshold x. In 
its core lies the fact that any composite integer n > 1 has a prime divisor 
p<+J/n. The steps of the algorithm are: 


(1) List all integers in [2, 2]. 
(2) Circle the number 2 and delete all proper multiples of it. 
(3) 


3) Find the smallest n € [3, ,/z] that has not been deleted nor circled yet 
and circle it. If such an n does not exist, circle all integers that have not 
been deleted yet and terminate the algorithm. 


(4) Delete all proper multiples of n and return to step (3). 


It is clear that after the termination of the algorithm the circled integers 
will be exactly the primes < x. The algorithm of Eratosthenes is called a 
“sieve” because the only integers that “do not pass through it”, that is to 
say, are not deleted at any stage of the algorithm, are the primes < z. 


The idea of Eratosthenes was further developed by Legendre, who used 
it to write down a formula for m(a). Indeed, an integer n € (,/2, 2] is prime 
if and only if it has no prime factors < \/x. We thus arrive at the formula 


(2.1) m2) =#{n <x: (n,P(Vx)) =1}+O(V2), 
where we recall that 


P(y) = [[p. 


PSy 


»| 
“J 


28 2. Combinatorial ways to count primes 


Consider the more general problem of estimating the number of integers 
< x that are coprime to some integer m. Since (n + m,m) = (n,m), the 
condition that n and m are coprime is m-periodic. In particular, every 
interval of length m contains the same number of integers n coprime to m. 
This number is given by Euler’s totient function 


g(m) = #{1<n<m: (n,m) =1} = #(Z/mD)*, 


where (Z/mZ)* denotes the group of reduced residues mod m. We will give 
a formula for the number of integers < x that are coprime to m in terms 
of y(m), but first we establish some fundamental properties of the totient 
function. 


The Chinese Remainder Theorem implies the group isomorphism 


(Z/abZ)* & (Z/aZ)* x (Z/bZ)* whenever (a,b) = 1. 


We infer from this relation that y(ab) = y(a)y(b) whenever (a,b) = 1. Any 
function f : N > C satisfying the functional equation 


(2.2) f (ab) = f(a) f(b) whenever (a,b) =1 


and the condition f(1) = 1 is called multiplicative. If holds for all 
a,b € N without any restrictions on their greatest common divisor, then f is 
called completely multiplicative. Thus we see that vy is multiplicative but not 
completely multiplicative (for example, y(4) = 2 but y(2) = 1). Iterating 
with f = ~ implies that 


g(m) = TT vl"). 
Elim 


Hence calculating y(m) is reduced to finding its value on prime powers. The 
latter is easier, since the condition (n,p*) = 1 simplifies to the condition 
p{n. Consequently, 


p(p*) = p* — #{1<n<p*:pln} =p*—p*t. 


We thus deduce the formula 
g(m) _ mm: 
mo II (1 a 
p|\m 


Now, we go back to the problem of estimating the counting function of 
integers coprime to m. As we already discussed, periodicity implies that 
each interval of length m contains exactly y(m) of such integers. If N is the 
unique integer satisfying the inequalities Nm < x < (N +1)m, then 


N-y(m) <#{n <x: (n,m) =1} <(N+1)-¢(m). 


2. Combinatorial ways to count primes 29 


Noticing that N = |x/m]| = x/m-+ O(1), we find that 
#{n <a: (n,m) =1} = (2/m+0(1))-g(m) 


(2.3) = 2] (1-5) +0(e(m). 
plm 


The remainder term in the above estimate can be improved significantly. 
To do so, we reappraise the sieve of Eratosthenes-Legendre from a purely 
combinatorial point of view: we have 


(2.4) #{n<a2:(n,m)=1}=#[ \{n<ax:pt{n}. 
plm 
We apply the inclusion-exclusion principle to rewrite the right-hand side as 
(2.5) #({){n<a:ptn}=#Hn<a}-S #{n<z:pln} 
plm plm 


+ > #H{n<a:ppy'ln}F 


Be) = |2]-Sole/p] + > l2/@r')| F-- 


pi|m p<p' 
p,p'|m 


The above formula has 2#t?!} summands—one for each choice of a subset 
of the distinct prime factors of m. The quantity #{p|m} will reoccur several 
times throughout the book, so we give it a name: 


(2.7) w(m) := #{p|m}, as wellas Q(m):= k. 
pk||m 


Inserting the approximation |y| = y + O(1) into (2.6) and noticing that 


yields: 


Theorem 2.1. Forz >1 andm EN, we have 


{nS es (mm) = 1} =<] (1-5) +0(22), 
plm 


Remark 2.2. The above theorem has a natural probabilistic interpretation: 
for n to be coprime to m, we must have that p{n for each p|m. The chances 
that a randomly chosen integer n is a multiple of p are about 1/p: indeed, 
we have #{n < x: pln} = |a2/p| ~ x/p as x — co, so we see that a 
1/p proportion of integers are divisible by p. But then, the chances that 


30 2. Combinatorial ways to count primes 


an integer is not divisible by p are 1 — 1/p. Assuming that divisibility 
by different primes are independent events, we are led to expect that the 
chances that an integer is coprime to x are about [],,,,(1— 1/p), as proven 
in Theorem [2.1] when x and m are in appropriate ranges. We will return to 
this probabilistic heuristic in Chapter [15] 


In order to appreciate the strength of Theorem [P.i]in the context of the 
sieve of Eratosthenes, we need to understand the product [],,,,,(1 — 1/p) 
when m = P(,/z). The following lemma establishes an upper bound that 
is sharp up to a multiplicative constant (cf. Theorem [3.4). The idea of its 
proof goes back to Euler and will play a fundamental role in counting primes 
throughout the book. 


Lemma 2.3. For each x > 2, 


Il (1 -5) = a 


pKx 


Proof. Instead of bounding the product from above, we consider its recip- 


rocal ; 
Tl (:--) =[[(i+2+5+4--). 
pKx P pKu Pp oP 

Expanding the rightmost product, we see that the summands are in one- 
to-one correspondence with products of the form 1/(p{! ---p%”), where p1 < 
+++ << pp < x and aj > 1 (the empty product with r = 0 is also permitted). 
By the Fundamental Theorem of Arithmetic, this means that the summands 
can be reindexed as 1/n, where the variable n runs over all integers that 
only have prime factors < x, that is to say, the set of z-smooth integers. In 
particular, this includes all integers n < x, so that 


H0-2)"> shoe ft aus 
P ma NKx fi t 


pKa 


as claimed. 


Combining relation (2.1), Theorem [2.]]and Lemma[2.3} it seems like we 
can prove that (x) < 22/loga2. However, this is wishful thinking because 
the error term in Theorem becomes way too big when m = P(\/x): 
we have w(m) = 7(./x), which we expect to be of size x /x/logz. The 
underlying reason for the failure of Theorem [2.]]in estimating a(x) is that 
relation (2.6) has an enormous number of terms. As we will see in Theorem 
[3.4{c), this is not a mere technicality: the function x]],,< ¢(1—1/p), which 
is the alleged main term in Theorem. []when m = P(,/Zz ), is not asymptotic 
to m(x) as Z > 0. 


Chebyshev’s estimate 31 


Even though the above discussion puts a cap on our expectations, the 
sieve of Eratosthenes-Legendre can still be used to prove that primes are 
sparse. The fundamental observation, made by Legendre, is that since we 
are only after an upper bound for 7(a), we may use the simple inequality 


(x) < #{n <a: (n,Py))=1}+y, 


where y is a parameter at our disposal. This inequality follows by noticing 
that every prime p > y is coprime to P(y). We then use Theorem [2.]] to 
find that 


1 
(2.8) n(x) <a] ( Zs -) + O(y +27), 
psy e 
To bound the right-hand side, we apply Lemma[2.3] Taking y = log x yields 
x 
2.9 T(x) << ——. 
ee) (2) log log x 
Despite the fact that the above estimate is rather weak compared to 
what we expect to be the truth, at least it demonstrates that approximately 
100% of the integers are composite (see also Exercise [2.3]. On the other 
hand, taking logarithms in Lemma[2.3]and using Taylor’s expansion for the 
function log(1 — 6) when |6| < 1/2, we find that 


1 
> — > log log x — O(1) 


px 


for all x > 3, which shows that primes are not too sparse. 


Chebyshev’s estimate 


In 1852, Chebyshev discovered a completely different way to count primes 
and vastly improve (2.9). His argument was simplified significantly by Erdés. 
The key observation is that the central binomial coefficient ee is an integer 
that is divisible by all primes p € (n, 2n]. Indeed, 


(°") _ 2n(2n-1)---(n +1) 


n n! 


o] 


so if p € (n,2n], then p divides the numerator and is coprime to the denom- 
inator. Thus p| ies for all p € (n,2n], as claimed. But then the product of 
all such primes divides Cy, and we deduce that 


I<) <EC)-¢ 


n<pK2n j=0 


32 2. Combinatorial ways to count primes 


The rightmost inequality is almost sharp, since Stirling’s formula implies 
that Cc =< 4"/,/n. Taking logarithms and recalling the definition of 0(x) 
in Example [1.8] we find that 


6(2n) — O(n) = s logp < nlog4 


n<p<2n 


for alln € N. Applying the above inequality with n = 2/,0 <j < k, and 
summing telescopically implies that 


0(2*) < 2**1 log 4. 
For each x > 1, there is k € N such that Qk-1 < x < 2*, whence 
O(a) < 0(2*) < 2**1 log 4 < 4rlog4 < 6a. 


We may pass from the above inequality to an upper bound for 7(z) using 
partial summation, as in Example[I.8} for all x > 2, we have 


6 * 6 * 6 
=) ff w) dy << — | 4 dy < —. 
logx Jo ylog*y logx Jo logy log x 
An analogous lower bound can also be established by studying the prime 


factorization of (). The details of the proof are outlined in Exercise 2.10} 
This leads us to: 


Theorem 2.4 (Chebyshev’s estimate). For x > 2, we have 


n(n) = 


x 


logx 
Exercises 


Exercise 2.1. Let f be an arithmetic function. Show the following: 
(a) f is multiplicative if and only if f(m) =[[ yn f(p*) for alln EN. 


(b) f is completely multiplicative if and only if f(n) = |] f(p)* for alln EN. 


pk ||n 
Exercise 2.2. A function f : N > C is called additive if 
f(mn) = f(m) + f(n) whenever (m,n) = 1. 
Show that the functions w and Q, defined in (2.7), are additive. 
Exercise 2.3. For x > y > 3, prove that 
Y . 
log log y 


#{r<p<rty}< 


Exercise 2.4 (The square-free sieve). 
(a) Modify the sieve of Eratosthenes-Legendre to prove that 


#{n<a:nis square-free} = 2-T] (1-5) + O(V/2z) (a > 1). 
P 


Exercises 33 


(b) Prove that Lda") = 6/7. [Hint: Show Later) Sse tin? 
where S(y) ={n EN: pln > p< y} is the set of y-smooth numbers, and use 
Exercise d) with k=1,] 


Exercise 2.5. Let f(n) = #{(m1,n2) € N? : [ny,n2] = n}, where [n1, ng] is the 
least common multiple of n, and ng. Show that f is multiplicative and evaluate it 
at prime powers. 


Exercise 2.6. Set f(n) = y(n)/n, and let {n;,}%2, be the sequence of values n at 
which f attains a “record low”, that is to say, n; = 1 and, for k > 2, nz is defined 
as the smallest integer > nz_1 with f(n~) < f(m) for all n < ng. (For example, 
ng = 2 and n3 = 6.) Find a general formula for nz, and f(nz). 


Exercise 2.7. Recall the definition of Chebyshev’s psi function from Exercise 
Show that |)(x) — 0(x)| « x for x > 1. 


Exercise 2.8. Let p; < po < --- denote the sequence of primes, and let Py = 
Ppip2**:p, denote the kth primorial. The validity of the Prime Number Theorem 
can be assumed in solving this exercise. 


(a) Show that p, ~ klogk and log P, ~ klogk as k > on. 

(b) Show that w(n) S logn/loglogn as n + oo. [Hint: What can you say about 
w(n) ifn < Pp? 

(c) Show that 


ore ea, eae) 


p|n p<logn 
pXlogn 


Exercise 2.9. Let t(n) = #{d|n} be the divisor function and, more generally, 
7 (n) = #{(di,...,d,) € N* : dy ---d, =n}. 
(a) Show that 7, is multiplicative. 
(b) For each prime power p*, show that 
k < te(p*) < min{k?, (a + 1)*-4}. 
Conclude that k#™ < (nn) < min{k2™, r(n)*-1}. 
(c) For each prime power p*, show the exact formula 


Tk(p*) = (" ‘) 


(d) Assuming the Prime Number Theorem, find a sequence of integers n such that 
Th. (n) = k+eM)) logn/loglogn | | Hint: How can you create an integer with lots 
of divisors?| 


(e) For y > 1, let Q(n;y) => a. Show that 


p?||n, p>y 
O(n; y) < log n/log y. 
(f) Show that 
te(n) < [J at 1)* [kt < Qlogn + 1)4 by. gloen/ ley, 


p*||n p°||n 
PLY p>y 


34 2. Combinatorial ways to count primes 


Choose y appropriately to conclude that 


log k+o(1))/loglogn (n + 00). 


Th(n) < ni 
Exercise 2.10. Prove the lower bound in Theorem [2.4]as follows: 


a) If v»(m) denotes the p-adic valuation of m, that is to say, the highest power of 
Pp & 
p that divides m, show that 


vp(n!) = So Ln/p*). 


k>1 


(b) Show that [27] — 2 |a] is a 1-periodic function taking only the values 0 and 1. 
Conclude that : 
n 
cane n(2n)_ 
(3") < em 


(c) Prove that a(x) >> x/loga for x > 2. 
Exercise 2.11 (Nair [147]|). Let 
1 
eal x” (1—2)"da and M, = \lem[n + 1,n4+2,...,2n+ 1]. 
0 

(a) Prove that I,,- M,, is a non-negative integer. 
(b) Prove that I, < 47”. 
) 


(c) Prove that My, < (2n+1)™@rth, 
(d) Deduce a new proof of the lower bound z(x) >> a/logwa for x > 2. 


Exercise 2.12 Find the average value of the greatest common divisor of a and b 
asymptotically, as a and b range over all integers up to a. 


Chapter 3 


The Dirichlet 
convolution 


The combinatorics of the sieve of Eratosthenes are naturally encoded in 
the Mobius function that is denoted by pw and defined by 


(n) ied if n is square-free and has k distinct prime factors, 
p(n) = 


0 otherwise. 


The Mobius function can be easily seen to be multiplicative. Its connection 
to the sieve of Eratosthenes is revealed by observing that, since a natural 
number n equals 1 if and only if it has no prime factors, the inclusion- 
exclusion principle implies that 


(3.1) Inat=1— D doin + Do dont : = Spd) 
p<p! d\n 


This formula is known as the Mébius inversion formula. Applying it with 
(n,m) in place of n, and noticing that d|(n,m) if and only if d|n and d|m, 
leads us to (2.6). The Mobius inversion formula sits naturally inside a general 
framework that we develop in this chapter. 


The ring of arithmetic functions 


We say that f is an arithmetic function if it is of the form f : NC. We 
write A for the set of all arithmetic functions. Given f,g € A, we define 
their Dirichlet convolution f * g to be the arithmetic function defined by 


(f*g)(n)= S° fla)g “28 g(n/d) = S > f(n/d)g(d) 


ab=n d|n 


36 3. The Dirichlet convolution 


The triplet (A, +, *) is a commutative unitary ring whose unit is the function 
6(n) := 1,=1. In this set-up, the Mobius inversion formula states that ju is 
the Dirichlet inverse of the constant function 1, that is to say, its inverse with 
respect to the operation of the Dirichlet convolution. In general, a function 
f possesses a Dirichlet inverse if and only if f(1) 4 0. In particular, all 
multiplicative functions are invertible in this ring. 


Note that the Dirichlet convolution preserves multiplicativity: if f and 
g are multiplicative, then so is f * g. It can also be shown that if f is 
multiplicative, then so is its Dirichlet inverse. In particular, the operation * 
renders the set of multiplicative functions an abelian group. 


Proving the above affirmations about the Dirichlet convolution is a good 
exercise. 


Convolution identities 


As we will see shortly, an important technique for estimating averages of 
various arithmetic functions f has as its starting point a decomposition of f 
as the Dirichlet convolution of two simpler arithmetic functions. With this 
in mind, we study here some important examples of such decompositions. 


One of the most classical convolution identities concerns the divisor func- 
tion, for which we have 


T(n) = #{d|n} = (1 * 1)(n). 
This formula can be generalized to all higher-order divisor functions, which 
we defined and studied in Exercise 2.9] by noticing that 


a(n) = #{(di,...,dp) € N® : dy--+dy =n} = (1 +++ *1)(n). 
k times 
A related identity allows us to rewrite the “sum-of-divisors function” 
o(n):= 5 d= (id * 1)(n), 
d\n 
where “id” denotes here the identity function on N, that is to say, id(n) = n. 
A less obvious example of a convolution identity is 

(3.2) yp = p* id. 


There are two ways to prove (8.2): either we observe that both sides are 
multiplicative and compare them at prime powers, or we use that 


eir)= YS > ud). 
l<nxm d|n, dlm 


Interchanging the order of summation yields (8.2). 


Dirichlet’s hyperbola method 37 


Finally, it is possible to write down a convolution identity for (a close rel- 
ative of) the indicator function of primes that encapsulates the Fundamental 
Theorem of Arithmetic. We start by expressing n in its prime factors, say 
n= [peqn p*, and then take logarithms. This yields the formula 


logn = alogp = S > logp 


p2||n pE|n 
because if p®||n, then p*|n for k € {1,...,a}. We have thus proven that 
(3.3) log = 1A, 


where 


kes logp if n= p* for some prime p and some integer k > 1, 
n)i= 
0 otherwise 


is von Mangoldt’s function, which is a very convenient weighted variant of 
the indicator function of the sequence of primes. As a matter of fact, due 
to the identity (8.3), it is often easier to obtain results about primes by 
working with the summatory function of A, i.e., Chebyshev’s psi function 
(see Exercise[L.7), instead of 7(x). We may then pass to 7() using Exercise 
[2.7Jand the discussion in Examples [1.8] and 


Remark 3.1. Guessing relations (3.2) and (8.3) is far from trivial. In the 
next chapter, we will see a more systematic method of obtaining convolution 
identities. Using it will explain (3.2) and (8.3) in a more intuitive way. 


Dirichlet’s hyperbola method 


When an arithmetic function f is the Dirichlet convolution of two simpler 
functions g and h, we can estimate its partial sums using what we already 
know about the partial sums of g and h. The starting point is the identity 


(3.4) Yo f(r) = 35 YS g@ab) = YE a(@h(0). 
N<ux n<z ab=n ab<a 


There are several ways to rearrange the right-hand side of (8.4). An obvious 
one is to fix a and sum over b. This leads us to the formula 


S> Fn) = So gla) S> (0). 


NXu axa b<a/a 


The above arrangement of the summation is particularly effective when g is 
either supported on small integers a, or when g has small partial sums. We 
illustrate the details by estimating the partial sums of the totient function. 


38 3. The Dirichlet convolution 


Theorem 3.2. For x > 2, we have 


342 
S> v(n) = + O(a log a). 


NKx 


Proof. In the identity y = yz «id, we note that the functions y and id are 
much bigger in modulus than yz. We thus rearrange the sum as 

Tl) = Tale) Tb 

nXx agx b<a/a 
We have roy b = y?/2+O(y) by the Euler-Maclaurin summation formula 
(Theorem [1.10), whence 


Y v(n) = wa) (2 + O(@/a)) =F EO +0(e MO) 


Nx axx acxr 
where we used the triangle inequality to bound the error term. The sum 
over a in the main term equals c + O()>,.,, 1/a”) = e+ O(1/x) with c = 
a> H(a)/a?, whereas the sum over a in the error term is < }),<, 1/a< 
loga. To complete the proof, it remains to prove that c = 6/7”. This 
identity is a special case of relation that we will prove in the next 
chapter. See also Exercise 


axx 


Let us now use the above ideas to estimate }7,,<,7(n): we have 


So r(n) = SO «Im =o YE 1. 


n<au n<ux axn b<a/a 


The innermost sum equals z/a + O(1), whence 


1 
T(n) =2 —+O(2) = xslogx+ O(a), 
r= #5 +00) (x) 
by Theorem [Ii] This is a genuine asymptotic formula, but the error 
term is only slightly smaller than the main term and we would like to 
do better. Reexamining our argument, we see that the approximation 
Dv<a/al = z/a + O(1) is not very good for large values of a. Instead, 
for large a, it would have been much better to switch the roles of a and b, by 
fixing b and summing first over a instead. More formally, given parameters 
A,B >1 with AB =z, we can rearrange the sum as follows: 


d= DoD DD st 2s 


nXax ab<a ab<ax ab<ax ax A b<a/a b<B A<axa/b 
axA a>A 


We write the rightmost sum over a as )/4<,/) 1 — iaca | to find that 


(3.5) Drm => 14 OH 1- (HY (YH). 


NXa agA b<a/a b<Bax<a/b axA b<B 


Mertens’ three estimates 39 


The estimate )/,,<, 1 = y + O(1) implies that 


So r(n) = So (x/a + O(1)) + $0 (a/b + O(1)) - (A+ O(1))(B + O()). 


N<u a<A b<B 


Gathering all remainder terms, we rewrite the above formula as 


So = 05 +05 AB + O(A+ B). 


NXax ax<A b<a 


Applying Theorem twice and recalling that AB = x, we deduce that 


S"r(n) = 2(log(AB) + 27 + O(1/A + 1/B)) — AB + O(A+ B) 


Nx 


= zlogr+ (2y—1)r+O(A+B). 
The optimal choice is A = B = \/z, which yields Dirichlet’s famous estimate: 


Theorem 3.3. For x > 1, we have 


Sr(n) = xlogxr t+ (2y—1)r+ O(V/2). 


NKx 


The method of proof of Theorem £.3] is called Dirichlet’s hyperbola 
method. Its name is justified by a geometric reappraisal of it. The sum 
ab<z | counts the number of lattice points (a,b) € N x N below the hy- 
perbola ab = x. The way we rearranged this sum corresponds to writ- 
ing the range of (a,b) as X UY, where X = {(a,b) € N?: a < A} and 
Y = {(a,b) € N*:b < B}. We then use inclusion-exclusion to infer that 


dorm) = |XUY| = |X| +/¥]- [XY], 


NKx 


which is relation (8.5). Dirichlet’s hyperbola method is a key tool in analytic 
number theory and we will encounter it several times in this book. 


Mertens’ three estimates 


We conclude this chapter with an application of the above circle of ideas to 
the theory of primes due to Mertens. Using the convolution identity (8.3), 
he discovered in 1872, several years before the Prime Number Theorem was 
established, a way to estimate various sums over primes. 


40 3. The Dirichlet convolution 


Theorem 3.4 (Mertens’ three estimates). For x > 2 we have: 


(a) Ss" PEP log x + O(1); 


pKu 


1 
(b) » — = loglogr+c+O(1/logx), where c is a constant; 


1 e 7 
(c) Tt =) ee + O(1/log x). 


Proof. First, we prove (a). On the one hand, the identity log = Ax 1 yields 
So logn = 5° A(a) SO 1 =) A(a)- (@/a +001) = 256 a + O(a), 
nNKx axx b<a/a axnx axx a 


where the error term was bounded using Chebyshev’s estimate (Theorem 
2.4). Since >) ,— yk, p59 A(a)/a = O(1), we conclude that 


_ log p 
S log n = oe ; + O(1). 


Nga px 


On the other hand, we know that 


S logn = rlogx — x + O(log z) 


n<ax 
by partial summation. This completes the proof of Mertens’ first estimate. 
To prove (b), we use (a) and partial summation. More precisely, let 
> DBP logx + R(x), 
pKu 
so that R(x) = O(1). We then have 


1 7 1 ] a | eee 
waa fay BPR fas fart. 
D 9 logt ra 9g tlogt 9 logt 


PLx 


The first integral on the right-hand side equals log log x — log log 2. In the 
second integral, we integrate by parts to find that 

R(t) R27) 
log x log 2 


* R(t) 
2 tlog?t 


1 
> — = log log x — log log 2 4 


px 


Exercises 4] 


Since R(2~) = — log 2 and the integral f>° R(t)/(t log? t)dt converges abso- 
lutely by the estimate R(t) < 1, we conclude that 


1 R ° Rt 
\” = = loglogx + ¢ (2) / @) dt 
noe log x x tlog*t 


= loglogr +c+ O(1/log zx) 
with c = —loglog2 +14 J>° R(t)/(tlog? t)dt. 


Finally, we prove (c). Using Taylor’s theorem, we can write log(1—2) = 
—a — f(x), where f(x) = O(a?) when |x| < 1/2. In particular, 


log T] (1 =) =-S~1/p- >> f(/p). 


pXu peux p<ax 


The series }/,, f(1/p) converges and its tail satisfies the estimate 


do fA/p) = 7 O(1/p?) = O(1/2). 


px prx 
Together with part (b), this yields the estimate 
1 
(3.6) log [J (i- -) = —loglogr —K + O(1/ logs), 
px a 


where k :=c+)/, f(1/p). It remains to show that « = y. This is proven 
using information about the analytic behavior of the Riemann zeta function 
around the point 1 (see Exercise 5.4). 


Remark 3.5. Theorem [3.4(c) implies that the alleged main term in The- 
orem [2.1] when m = P(,/z) is ~ e-7a/log /x = 2e-7x/logx as x > on. 
But 2e77 = 1.12291... > 1, so we cannot have that a(x) ~ 2e772/logz, 
for this would contradict Theorem B.4{a) by partial summation. 


Corollary 3.6. Asn — oo, we have 


p(n) 2 e 'n/loglogn. 


Proof. This follows by Exercise 2.8(c) and Theorem B.4{c). 


Exercises 


Exercise 3.1. Let f be an arithmetic function. Prove that it has a Dirichlet 
inverse g if and only if f(1) 4 0, in which case g can be calculated recursively by 
the formula g(n) = —f(1)~! Dan, a>i f(@)g(n/d). 


Exercise 3.2. Let A and M denote the set of arithmetic and multiplicative func- 
tions, respectively. Prove that (A,+,*) is a unitary commutative ring and that 
(M, *) is an abelian group. 


42 3. The Dirichlet convolution 


Exercise 3.3. Determine which of the following arithmetic functions are multi- 
plicative: 


fifn)=n; faln)=logn;  fa(n)=w?(n); falr) = Sod; 
din 

fs(n) = 73(n); fon) = (1); frm) = Anaoyar3— fa(n) = g(n)/n. 
Exercise 3.4. Let f be a multiplicative function and g its Dirichlet inverse. 
(a) For a prime p, calculate g(p) and g(p”) in terms of the values of f. 
(b) If f is completely multiplicative, show that g = pf. 
Exercise 3.5. Prove the following variants of the Mobius inversion formula: 
(a) Show that f = 1+ g if and only if g = px f. 


(b) If f = 1%*g, then show that g(p*) = f(p*) — f(p*—!) for all primes p and all 
integers k > 1. 


(c) Let F,G:Rs1 > C. Prove that F(x) 
if G(x) = Vince H(m)F(2/n) for all x 
Exercise 3.6. For x > 1, show that }7,,<, u(m)|x/n] = 1, and deduce that 


aris 


nKx 


= Dace G(a/n) for all x > 1 if and only 
> 1. 


Exercise 3.7. For each k € N, we define the Ath generalized von Mangoldt function 
to be A*) = yx log”. Prove the following statements: 


(a) A& +) — AM log +A «A, 
(b) A, is supported on integers n with < k distinct prime factors. 
(c) If n = p,--- px for some distinct primes pi,...,px, then 
A“) (n) = ki(log p1) ++ - (log pe). 

(d) 0< A“ (n) < 2*-!(logn)* for each n € N. 
Exercise 3.8. Find f such that 2 = 1 f, and deduce that 

#{n < x: n is square-free } = cx + O(/2) (x > 1), 
where c = >>, p(n)/n?. Then, use Exercise 2-4] to prove that c = 6/7”. 


Exercise 3.9. Show that there are constants c,,c2 € R such that 


S> w(n) = rloglogz + ca + O(x/ log x) (a > 3); 
nNKx 
S$ Qn) = rloglog x + cox + O(x/ log x) (a > 3). 
nxn 


Conclude that, if € : N + Ryo is such that limp. €(n) = co, then #{n <a: 
Q(n) > w(n) + €(n) } = O2-400(2). 

Exercise 3.10. Prove that, for every fixed integer k > 3, there is a polynomial P;, 
of degree k — 1 and of leading coefficient 1/(& — 1)! such that 


Ss te(n) =a P,(log x) + O,(a!-/*) (a > 1). 


nxu 


Exercises 43 


Exercise 3.11. Let S denote the set of square-full integers (see Exercise [L.6] for 
their definition). 

(a) Show that 1s(n) = > ,2;3—», H7(0). 

(b) Show that there are constants c1,c2 € R such that 


#8 [1,2] = ere/? + con? + O(2™), 
Exercise 3.12. 
(a) Show that 2” = 1 * yu? and deduce that there is a constant c such that 


ys 24) — 6n-22 log x + cx + O(2?/8) (a > 2). 


nXKx 


(b*) Prove that the error term in (a) can be improved to O(./z log 2). 
Exercise 3.13. Estimate the sums p< (log p)*/p and Doss 1/p?. 


Exercise 3.147 Prove that the Prime Number Theorem is equivalent to the exis- 
tence of a constant c such that 


l 
(3.7) S 7 eF slog teto(l) («+ 00). 
pKa 


Exercise 3.15* (Landau [124]). Recall the notation 6(n) = 1,21. This exercise 
proves that the Prime Number Theorem is equivalent to the estimate 


(3.8) S- p(n) = o(a@) (a > oo). 
nKx 

(a) (i) Show that —ylog = w* (A—1)+6. 

(ii) Assuming the Prime Number Theorem, prove (8.8). [Hint: Prove first 

that once Mn) logn = o(x log x) as x — 00] 
(b) Let f(n) = logn — r(n) + 24. 
(i) Show that A-—1= px f — 2y6. 
(ii) Show that 5°, <, f(n) « Vz for z > 1. 
(iii) Assuming (8.8), prove the Prime Number Theorem. 


Exercise 3.16* 


a) Prove there is a choice of constants c,,c2 for which log? n — 273(n) — 
g 


N<Kx 
cT(n) — cz) < #?/3 uniformly for x > 1. 

(b) Recall the function Ag = py * log? from Exercise B.7] Prove that it satisfies the 
estimate >, -,, Ao(a) = 24 log a + O(x) for x > 1, and conclude that 


nKx 
(x) logr + S° w(#/p)logp = 2xlogx + O(a) (a > 1). 
pKu 
(c) Show that 
lim sup(w(x)/a) + lim inf(y(x)/2) =2. 


In particular, if lim,-,.. ¢(x)/ax exists, then it must equal 1. 
Exercise 3.177 Show that the Prime Number Theorem is equivalent to the relation 


, e(n)/n = 0. [Hint: Exercises 3.6] and [3.15] 


Chapter 4 


Dirichlet series 


The ubiquity and importance of convolution identities in analytic num- 
ber theory calls for a systematic way of discovering them. We can obtain a 
very satisfactory answer to this problem by developing the theory of Dirichlet 
series: to each arithmetic function f, we associate the generating function 


called the Dirichlet series of f. We do not concern ourselves with the con- 
vergence of this series for now, an issue that we will address in the end of 
the chapter. Rather, we treat F'(s) as a formal infinite series. 

We write D for the set of formal Dirichlet series. If G(s) = )°°°., g(n)/n* 
is another element of D, then we define 


F(3) +6(6) = LM 49) gna P(Q@(e) = WO), 
n=1 n=1 


with the latter definition motivated by the formal calculation 


(4.1) = my ee =o E fow 


b=1 a,b>1 ab=n 


Evidently, the triplet (D,+,-) forms a ring that is isomorphic to the ring 
of arithmetic functions (A,+,*). Hence, the study of the ring of arithmetic 
functions is equivalent to that of formal Dirichlet series. 


In view of the above discussion, if we are given the functions f and g 
with Dirichlet series F' and G, respectively, then the function h solving the 
identity f = g * h is the unique arithmetic function whose Dirichlet series 
is the quotient F/G. Hence, we are faced with the problem of inverting G. 


44 


4. Dirichlet series 45 


When g is multiplicative, this problem has a particularly elegant solution. 
The reason is that in this case G satisfies the formal identity 


(4.2) 579) (1+ 4 8) ,....), 
n=1 


n 
Pp 


called the Euler product of G. Before we discuss the formal proof of (4.2), 
note that it allows us to invert G rather easily, since the factors of its Euler 
product are Taylor series in z = p* (see Example [4.3] below). Moreover, 
we can estimate the coefficients of 1/G using Cauchy’s residue theorem (see 
Exercise [4.10). 


To see (4.2), we expand its right-hand side. We then obtain a formal 
sum of all products of the form 


(ph) ++ 9(Pr") 

(pit +++ pr")s 
where pj, ..., pr are distinct prime numbers, aj, ..., ar € Zs; and r € Zso. 
By multiplicativity, the numerator can be written as g(p{'---p%"). The 
Fundamental Theorem of Arithmetic implies that the products p{t -- +p? 
are in one-to-one correspondence with all natural numbers. This gives a 


formal proof of (4.2). A rigorous version will be given in the next section. 


ri 


Example 4.1. The most important Dirichlet series is arguably the Riemann 
zeta function 
a 
¢(s) = a 
n=1 


We will study it in great detail in Part [2] For now, note that 

1 1 1\-1 
(4.3) (6)=[T[]+s+44+--)=][I(-<) - 

Pp Pp Pp 

p Pp 
To compute the inverse of ¢(s), we use the sequence of formal identities 
1 1 L(n) 
(4.4) —~ =]J(1--=) = 
ay pe Se re 


This formula can be considered as an analytic version of the Mobius inversion 
formula (8.1). 


Example 4.2. An alternative way of proving (8.2) is by noticing that 


(n) p-1. p(p-1) , p’(p-1) 
OS a ee 
= IT ( Pp p? p> 

me -l_ l=i/p . tis=1) 

le, isp ¢(s) 


46 4. Dirichlet series 


Example 4.3. Let f be a multiplicative function. We will calculate its 
Dirichlet inverse g using (4.2). We write F(s) and G(s) for the formal 
Dirichlet series of f and g, respectively. Since (f * g)(n) = In=1, the factors 
in the Euler ee of F(s i (s) must all be equal to 1. Namely, we have 


yy er LT re 


k>0 é>0 ao keen 
Thus 
g(p 1 
as) YM 5-14 Dor ( yy’ 
e>0 “ie if *)/p* j>1 k>1 


Expanding the jth power and regrouping the summands according to the 
power of p* in the denominator, we find that 


L 
(4.6) g(p") = S0(-1) 0-3 F@™)--- F@"). 


j=l kit+e-+kj=0 

ke y.jhj 21 
Since the above calculations are purely formal, it might be reassuring to 
verify them in a more direct way. Indeed, using induction on @ and the fact 
that \opse—m f(p")g(p*) = 0 for m > 1 yields a proof of (6), even in the 
case when F'(s) and G(s) converge nowhere. 


Example 4.4. Taking logarithms formally in (4.3), we find that 


log ¢(s = Lloe ( — =) = » oe 


P k>l 


By formal differentiation, we are led to the formal identity 


(4.7) => ae 


P k>l 


The right-hand side is the Dirichlet series of von Mangoldt’s function A we 
saw in the previous chapter. On the other hand, we have the formal identity 


We thus guess that the left-hand side of (4.7) is the Dirichlet series of «log. 
This leads us to the convolution identity 


(4.8) A= p «log. 
Mobius inversion thus yields 
(4.9) log = 1% A. 


This is relation (8.3), which we proved in a more combinatorial way before. 


Analytic properties of Dirichlet series A7 


Notice that we also have the variant of 
(4.10) A =—1 plog. 


This formula can be proven using Mobius inversion: 


=S>ul(d ) log(n/d) = SIG: ) log d. 


d|n d\n 


Alternatively, we can also see (4.10) by formally differentiating 1/¢. 


In conclusion, we can use formal manipulations of Dirichlet series to 
guess various convolution identities, which we can then also verify in a more 
direct way. 


Analytic properties of Dirichlet series 


We conclude this chapter with a study of the convergence of general Dirichlet 
series )°°°_, f(n)/n*. Following Riemann’s notation, we always write 


s=ot+it. 


Note that |n$]| =n’. Thus, if f(n) = O(n®), then S7°°, f(n)/n® con- 
verges absolutely for 0 > 6+ 1. Moreover, for each fixed ¢ > 0, it converges 
uniformly for 0 > 6+1+¢. Hence, it defines a holomorphic function in the 
half-plane o > 6 +1. 


This simple argument can be vastly generalized: Dirichlet series converge 
in half-planes of the form o > @ and they define holomorphic functions in 
their domain of convergence. 


Theorem 4.5. Let F(s) = S072, f(n)/n® be a Dirichlet series. If F (so) 
converges for some complex number s9 = oo + ito, then F(s) converges 
uniformly in compact subsets of the half-plane o > og. In particular, it 
defines a holomorphic function there. 


Proof. The proof is easier when the convergence at sg is absolute, so we 
first give it in this case. Note that |f(n)/n*| = |f(n)|/n? < |f(n)|/n% if 
o > oo. Weierstrass’s criterion then implies that the series }7°°., f(n)/n* 
converges absolutely and uniformly for o > ao. 


We now give the proof of the general case that is more delicate. We 
set g(n) = f(n)/n* and note that it suffices to show that the Dirichlet 
series G(s) = )7,5,9(n)n-* = F(s + so) defines an analytic function in the 
half-plane o > 0. For all M > N > 1, partial summation mist that 

1 


M 
Ms nm) + s/f 9 a 


ns 
N<n<M wot! N<n<ax 


(4.11) gr) _ 


48 4. Dirichlet series 


Since 5>°°_, g(n) converges, for each ¢ we can find some No such that 


S- g(n)| <e (M>N>M). 

N<nx<M 

As a consequence, we have 
g(n) 
Dn ae 


N<nx<M 


Z ae ioe e|s| 
<e4+|s| . yori TEE = 


(M>N2>N, o> 0). 


This clearly proves that, viewed as a series of functions, )°°°_, g(n)/n* 
converges uniformly in compact subsets of the half-plane o > 0. Indeed, if 
K is such a compact set, then there are numbers 6 > 0 and B > 1 such that 
o > 06 and |s| < B for all s € K. The analyticity of G follows readily. 


The above theorem naturally leads us to attach to a Dirichlet series 
F(s) = 0°, f(n)/n® the quantity 

Oc = 0O-(F) :=inf{o € R: St € R such that F'(o + it) converges }, 
called the abscissa of convergence of F. Theorem[4.5]implies that F' defines 


a holomorphic function in the half-plane o > o,. We further define the 
abscissa of absolute convergence of F' by 


Oa = Oa(F’) := inf{o € R: F(o) converges absolutely }. 


For example, if F = ¢, then og = og = 1. The properties of a, and aq are 
studied in the exercises. 


A lot of the formal calculations we saw earlier can be rigorously justified 
when the involved Dirichlet series converge absolutely. For example, this is 
true for relation (4.1). In particular, we may rigorously prove that 


(4.12) 5 tn) 


Jane Os)’ 
where Re(s) > 1. Taking s = 2 yields a more direct proof of the identity 
yo, w(n)/n? = 6/7? that we saw in Exercise 


Similarly, the Euler product representation of Dirichlet series of mul- 
tiplicative functions can be rigorously proven in their domain of absolute 
convergence. Firstly, let us consider the case of the Riemann zeta function. 
If Re(s) > 1, then the absolute convergence of the series }7,,, 1° allows 
us to sum its terms in any order. In particular, if we let V(y,k) ={n EN: 
p’|ln > p<yandv<k}, then 


¢(s) = lim lim > ne= lim jim (l+p°* soy 2 few +p"), 
ENE) TS Sy 


This establishes relation (4.3) when Re(s) > 1. 


Exercises 49 


The above argument can be easily generalized. We leave the details as 
an exercise. (It is highly recommended to first solve Exercise [1.14]) 


Theorem 4.6. Let f be a multiplicative function and s € C. The series 
yor f(n)/n* converges absolutely if and only if so does the double series 
a ye f(p*)/p**. When they both converge absolutely, we have 


CO n 2 
yee =[[ (+2042...) 


P 


Remark 4.7. (a) It is important to emphasize here that the assump- 
tion of absolute convergence is crucial to represent a Dirichlet series of 
a multiplicative function as an Euler product. For example, the func- 
tion f(n) = (—1)"~! is multiplicative. Its Dirichlet series F'(s) converges 
absolutely for o > 1 and conditionally for 0 > 0 by (411) with g = 
f, since Von<y f(n) = O(1) for all > 1. However, its Euler product 
(1 — 1/28 — 1/278 s*) [Tel Lf? + 1/p?s +--+) diverges to oo for 
s € (0, 1], because }7,,... 1/p = 00 by Theorem [3.4{b). 

(b) Knowing that F'(s) can be written as an absolutely convergent Euler 
product at some point s makes it very easy to check whether F'(s) vanishes: 
we simply need to check whether one of the factors vanishes (see Exercise 
[1.14{b)). For example, since ¢(s) = [],,(1— 1/p’)~! for o > 1, we have that 
¢(s) £0 for o > 1. As we will see in the next chapter, the location of the 
zeroes of ¢ is intimately related to the distribution of prime numbers. 


Exercises 


Exercise 4.1. (a) Find f and g such that o = yx f and y/id=1* g. 
(b) Use (4.5) and (46) to calculate the Dirichlet inverses of 7, 2° and y. 


Exercise 4.2. If f and g are Dirichlet inverses of each other, then find a non- 
recursive formula for the values of g in terms of the values of f. 


Exercise 4.3. Let F(s) = )¢,,5; f(n)/n* be a Dirichlet series, and let 7, and oa 
be its abscissas of convergence and of absolute convergence, respectively. 


(a) Prove that o. < 0a < oe +1. 


(b) Prove that 7. < +oo if and only if there is 9 € R such that f(n) = O(n®) for 
alln EN. 


Exercise 4.4. Compute the Dirichlet series associated to the functions fi, ..., fs 
from Exercise[3.3} your answer could be given in terms of ¢. Then, determine their 
abscissas of convergence and of absolute convergence. 


50 4. Dirichlet series 


Exercise 4.5. Show that there is a constant 6 € (0,1) and a polynomial P of 
degree 3 and of leading coefficient 1/7? such that 


S°7?(n) = xP(log x) + O(2"). 
n<ux 
[Hint: Write 7? = 74 * f and use Exercise .10]] 
Exercise 4.6. Let f be an arithmetic function, and let F'(s) be its formal Dirichlet 


series. Define f’(n) := —f(n) logn, and let F’(s) be its Dirichlet series. Prove that 
(f*g) =fl*gtfx*g’ and (FG) =F'G+ FG’. 


Exercise 4.7. Let f be a multiplicative function with formal Dirichlet series F, 
and define Ay via the convolution identity 
flog = Ay * f. 

(a) Prove that the formal Dirichlet series of Ay is —F’/F. 

(b) Prove that Ay is supported on prime powers, and that As(p) = f(p) log p for all 
primes p. [Hint: If F = ie E, is the Euler product of F’, we have the formal 
identity F'/F =, El,/Ep.] 

(c) Calculate Ay when f is completely multiplicative. 

(d) Calculate Ay when F(s) = [],(1 — 1/p*)-f), 

Exercise 4.8. Let F(s) = >>, f(n)/n® be a Dirichlet series with abscissa of con- 

vergence 0, < +00. Prove that the abscissa of convergence for the series of deriva- 

tives — >°°_, f(n)(log n)/n° is also o-. Deduce that F’(s)=— °°, f(n)(logn)/n§ 
when Re(s) > o¢. 

Exercise 4.9. If F(s) = }7,3; f(n)/n® and G(s) = >7,5, 9(n)/n* converge and 

are equal in the half-plane Re(s) > a, then prove that f = g. 

Exercise 4.10. Let f be a multiplicative function, and let g be its Dirichlet inverse. 

Fix a prime p and assume that there is some M > 0 such that |f(p*)| < M* for all 

ke Z>1- 

(a) Show that the power series }?,59 f (p")z* converges absolutely for |z| < 1/M 
and does not vanish for |z| < 1/(2M). 

(b) If0 <r <1/(2M), then show that 

( ky . A ¢ 1 dz 
OP) Oi Setar 14 FOE AR )A TE 

(c) Let € > 0. Prove that g(p*) <e,m (2M +¢)* for all k € Zs. 

(d) When f(n) = (—1)"~!, compute g(p*) for all primes p. What do you observe 
when you compare g(p") with the estimate of part (c)? 

Exercise 4.117 Let F(s) = ~~, f(n)/n* and G(s) = 30°, g(n)/n* be two 

Dirichlet series with F'(s)G(s) = 1. If F' has abscissa of convergence < +00, is it 

true that G also has abscissa of convergence < +00? 


Part 2 


Methods of complex 
and harmonic analysis 


Chapter 5 


An explicit formula for 
counting primes 


So far, we have seen various ways of counting primes using combinatorial 
devices. We now introduce a different approach that transforms the problem 
of estimating (x) into a problem in complex analysis. The key idea is to 
package the primes all together and form an appropriate generating function. 


Given an arithmetic function f, the most common generating function 
attached to f is arguably its power series 


A(z) = $0 f(n)2”. 
n>1 


This series converges to a holomorphic function in a disk |z| < R. Moreover, 
f(n) can be recovered from A(z) via Cauchy’s residue formula that implies 


6) fejea¢. 28 


7 271 |z|=r grt 


dz (nE€N, O0<r< R). 


We apply (5.1) when f = 1p, the indicator function of the sequence of 
primes. The associated power series is 


Q(z) = s a 


p 
Summing (6.1) forn = 0,1,...,.N when f = 1p yields the inversion formula 
1 Q(z) 1 Vid=2") 
2 1= — dz = d 
i) » Ds ont Ji jeer o Oni iiee. 2h = 2) ‘“ 


pen 0<n<.N 


for any r € (0,1). Hence, a good understanding of the analytic behavior of 
Q(z) can lead us to precise estimates for the counting function of the primes. 


52 


5. An explicit formula for counting primes 53 


The above strategy arrives quickly at a dead end because it is not clear 
how to control the function Q(z) without already knowing a lot about 
primes. As a matter of fact, the same objection can be raised for any 
generating function associated to the sequence of primes: how is it possible 
to determine its asymptotic behavior without already having a good grasp 
of the distribution of primes? 


To break the vicious cycle, we analyze Q(z) more closely. This function 
is naturally tied to the additive structure of the sequence of prime numbers. 
For example, note that 


Qh = So Pitt = geln)z”, 
Ply-sDPk n>0 


where g,(n) is the number of ways to write n as the sum of k primes. 
However, primes are multiplicative objects, so it is more natural to study 
them from a multiplicative point of view. To this end, we observe that the 
logarithmic function is a group isomorphism from (Ryo, x) to (R,+). We 
are thus naturally led to consider the generating function 


. zlosp. 
Pp 


This is no longer a power series because the exponents are not integers. 


Note that 208? = p'°&*. Working with the complex logarithm causes 
technical difficulties. For this reason, we make the change of variables s = 
— log z, so that our generating function becomes the Dirichlet series 


PAs) = es 


In view of Mertens’ second estimate (Theorem [3.4], this Dirichlet series has 
abscissa of convergence 1. In particular, Theorem [4.5]tells us that it defines 
a holomorphic function in the half-plane Re(s) > 1. 


Let us now consider the kth power of #: we have 


Aoh= Toa 


ae ve.) 8 
Ply-+5sPk ‘ Pr) n21 


where r;,(n) is the number of ways to write n as the product of k primes. 
In particular, rz, is supported on integers with < k prime factors. In com- 
parison, before we had no control over the support of gz. We thus see right 
away that Y(s) has better properties than Q(z). 

Taking the above argument one step further, Euler proved that A(s) 
can be written in terms of the Riemann zeta function ¢(s) = S>°°, 1/n*, 
which is for N what Y(s) is for the sequence of primes. The key is Euler’s 


54 5. An explicit formula for counting primes 


product formula 
ioc 
cs) =T] (0-5) (Re(s) > 1) 


that we proved in the previous ae Taking logarithms, we infer that 


(5.3) loge(s) = 32 a = Fs) 


p mel mel 


which provides the link between Y and ¢. The above formula is the starting 
point of analytic number theory, as it relates the function Y, for which we 
knew nothing about, to the function ¢. The latter is significantly simpler 
because it is defined as a summation over all integers, a very regular set. It 
thus seems plausible that we can obtain good estimates for # via this link. 


As in the case of the function Q(z) and the inversion formula (5.2), we 
want to find a passage from Y(s) to T(x) = D7 ,<, 1. We start by writing 


(5.4) Ps) = | ” g-Sde(2) = 8 / ” (a)a-?-Ade. 


Hence, we see that the function A(—s)/(—s) is the Mellin transform of the 
function 7(x). (A brief introduction to the necessary theory of the Mellin 
transform is given in the last section of Appendix[B]) Mellin inversion allows 
us to go from to the formula 


(5.5) Sig deemine Ff ay 
‘ 2 201 (a) 
pK<x 
where Sia) f(s)ds denotes the principal value of SRe(s)=0 f(s), namely 
(5.6) ei )ds = jm Ts f(s)ds. 
| Im(s)|<T 
Indeed, to see (5.5), we apply Theorem (whose hypotheses are met here 
with a, = —co and ag = —1) and then make the change of variables s > —s. 


Jumping into the void 


The inversion formula (5.5) expresses (x) in terms of the Riemann zeta 
function. However, it is not that useful as it stands for the estimation of 
a(x). Indeed, we expect that there are about x/logx primes < x. On the 
other hand, we have |x*| = x° on the right side of (5.5). Since a > 1, the size 
of x* is bigger than the expected main term. This means that if we are to 
extract an asymptotic estimate for (x) from (5.5), we must understand the 
integrand in a way that is precise enough to establish significant cancellation 
among the different parts of the range of integration. Obtaining such sharp 
estimates on Y without already controlling 7(a) seems impossible. 


The meromorphic continuation of ¢ 55 


It thus seems that we have again reached an impasse. Riemann though 
had a brilliant idea to circumvent it. He realized that ¢(s) can be extended 
in a canonical way to values of s outside its domain of convergence using 
the theory of analytic and meromorphic continuation of complex annie 
We do not need to delve too deeply into this theory; as we will see shortly, 
the special structure of ¢ allows us to meromorphically continue it] to C 
relatively easily. The extension we obtain has only one singularity: a simple 
pole of residue 1 at s = 1. Such an extension must be unique by the identity 
principle. Thus ¢ really is well-defined over C. Using this fact and Cauchy’s 
residue theorem, we can then replace the line of integration in by a 
new contour that reaches to the left of the vertical line Re(s) = 1, where x* 
becomes of smaller magnitude than «. Hence, we can hope to obtain bounds 
for this new integral that are of genuinely smaller order than x/ log x. The 
main term to the approximation of (x) will arise from the singularities in 
the region encircled by the old and the new contour of integration. The 
end result of this calculation will be a formula for a(x) in terms of the 
singularities of F. 


We devote the rest of this chapter to making the above discussion more 
precise and to laying Riemann’s idea on rigorous mathematical grounds. 
The meromorphic continuation of ¢ 


Perhaps the simplest way of meromorphically continuing ¢ is to use the 
Euler-Maclaurin formula. Indeed, when Re(s) > 1, ¢(s) is defined as the 
sum of the smooth function 1/n* over n > 1, so Theorem implies that 


(5.7) (9) = — sf May. 


The integral on the right side converges absolutely for Re(s) > 0 because 
{y} is bounded. Thus, the right side of supplies a meromorphic con- 
tinuation of ¢ to the half-plane Re(s) > 0. The only singularity of ¢ in this 
half-plane is a simple pole at s = 1 of residue 1 (a reflection of the divergence 
of the harmonic series }>°°_, 1/n). 


More generally, Exercise [L10(b) implies that 


k e-2 k-1 : 
_ 8s Be 1 ja0(s +5) f° Be({a}) 
68) = ath a e+ 3 : aan 


lIn fact, this theory was partly pioneered by Riemann himself. 

?The YouTube channel 3BluelBrown has an excellent video about the meromorphic contin- 
uation of ¢ that is called “Visualizing the Riemann hypothesis and analytic continuation”. The 
video is located at the web address https://www.youtube.com/watch?v=sDONjbwqlYw. 


56 5. An explicit formula for counting primes 


for Re(s) > 1. Since the right side is meromorphic for Re(s) > —k +1 with 
only a simple pole at s = 1 of residue 1, so is ¢. Letting k > oo establishes 
the alleged meromorphic continuation of ¢ to the entire complex plane. 


Let us now examine what the above discussion tells us about the analytic 
character of Y. We start from relation (5.3). Since >7,,.. A(ms)/m = 
yme2,p L/(mp™*) = O(1) for Re(s) > 1, we find that A(s) = log ¢(s)+O(1) 
for Re(s) > 1. In particular, A(s) ~ —log(s — 1) as s > 1, that is to 
say, A(s) has a logarithmic singularity at s = 1. This type of singularity 
prohibits us from extending # to an analytic function around s = 1. In 
particular, we cannot apply Cauchy’s residue theorem to an integral of the 
form {..(A(s)a*/s)ds, where C is a closed contour going around 1. For this 
reason, extracting the main term for 7(a) from is a bit hard (though 
certainly possible as Riemann himself explained in his 1859 manuscript). 

The above obstacle is merely of a technical nature. To overcome it, 
recall that the asymptotic behavior of 7(x) can be extracted from that of 
Chebyshev’s theta and psi functions 


(e) = Ss log p and Ue). = s A(n). 


pKx NKxr 
Indeed, we saw in Examples [1.8] and [1.9] how to go back and forth between 


m(x) and @(xz). In addition, Chebyshev’s functions are very close to each 
other in virtue of Exercise 2.7] which implies that 


|A(z)-—W(x)|KVe (22). 


Therefore, instead of estimating 7(a), we may work with w(x). We need an 
analogue of formula (5.5) for this function. 

In general, a straightforward adaptation of the proof of (5.5) implies the 
following generalization: if f is an arithmetic function whose Dirichlet series 
F converges absolutely in the half-plane Re(s) > 1, then 


(5.9) S> f(n)+ Leen f (2) eae [ Pls) 4s (o> 1, a> 1) 


2 2771 
n<ax 


This general identity is called the Perron inversion formula. 


We apply (5.9) with f = A whose summatory function is Chebyshev’s psi 
function. The associated Dirichlet series is —¢’/¢. Since ¢ is meromorphic 
over C, so is —¢’/¢. They both have a simple pole of residue 1 at s = 1. 
Moreover, if z is a zero of ¢ multiplicity m, then ¢’/¢ has a simple pole of 
residue m at s = z. Indeed, we may write ¢(s) = (s—z)’"g(s) with g analytic 
and non-zero in a neighborhood of z. Hence, (¢’/¢)(s) = m/(s—z)+(g'/g)(s) 
and g’/g is analytic around z. This implies that 


(5.10) res,—»(C' /C)(s) =m. 


Cauchy’s residue theorem and the explicit formula 57 


As we will see in the next chapter, the zeroes of ¢ fall under two cat- 
egories: the trivial zeroes, which are located at —2,—4,—6,..., and the 
non-trivial zeroes, which are located in the strip 0 < Re(s) < 1. We denote 
a generic non-trivial zero by? p = 6 + 77. 

Remarkably, there is an explicit formula for ~(x) in terms of the non- 
trivial zeroes of the Riemann zeta function 


Theorem 5.1. For all x,T > 2, we have 


2 
(5.11) ue) =a » = (ae log), 
lnl<T 


where the sum runs over the non-trivial zeroes of ¢ with each zero repeated 
as many times as its multiplicity. 


Before we explain why Theorem is true, let us momentarily pause 
and make a few comments about it. This astonishing result reveals that 
primes, an elementary arithmetic object, have a “dual” complex-analytic 
object associated to them: the zeroes of ¢. These two objects of seemingly 
unrelated nature are interconnected in a fundamental way: the main term 
on the right-hand side of approximates w(x) better and better as 
T — oo, similarly to the Fourier expansion of a periodic function. Hence, 
the zeroes of ¢ encode in principle everything we need to know about the 
distribution of primes (and vice versa). We may think of the zeroes as 
“frequencies” with which the counting function of prime numbers resonates. 
For this reason, they are of fundamental importance in mathematics. 


Theorem will play a key role in the proof of the Prime Number 
Theorem. Indeed, to establish the asymptotic formula w(x) ~ a, it suffices 
to bound I< x?/p and prove that it is of negligible size compared to 
x. Since |x?| = x, this essentially reduces the Prime Number Theorem to 
showing that (6 is a bit less than 1 for all zeroes of ¢. 


Cauchy’s residue theorem and the explicit formula 


Let us now give a rough sketch of the proof of Theorem [5.1] The complete 
details will be given in Chapter[8] after having developed the necessary tools. 


We present the argument in a more general context. Recall the Perron 
inversion formula (5.9), valid for any arithmetic function f whose Dirichlet 
series F' converges absolutely to the right of the line Re(s) = 1. Similarly to 


3The letter y here is not to be confused with Euler-Mascheroni’s constant defined by (L13). 
This ambiguous notation is customary in the literature. 

4The contribution of the trivial zeroes has been absorbed into the error term. There is an 
even more precise version of the explicit formula that takes into account trivial zeroes (see Exercise 
[8.2{a) and Chapter 17]). The version stated in Theorem is sufficient for most applications. 


58 5. An explicit formula for counting primes 


¢ and ¢'/¢, the Dirichlet series F of many interesting arithmetic functions 
can be meromorphically continued to a half-plane Re(s) > ag with ag < 1. 
In this case, the integral on the right-hand side of can be studied using 
complex analysis as we explain below. 


Fix a’ € (ao, 1) \{0} such that F's) has no poles when Re(s) = a’. Such 
an a’ always exists because F' has at most countably many singularities in 
any given open region. Moreover, let T = T(«) be large enough so that 


xs 
(5.12) So f(n == few _, F(s)Sds + E, 
n<u | Im(s)|<T 
with E = 0()° <x |f(n)|). The existence of such a T is guaranteed by (5.6). 
Furthermore, similarly to a’, the parameter T can be chosen in such a way 
that F has no singularities on the lines Im(s) = +T. 


Let C, denote the contour of integration in (5.12), that is to say, the line 
segment from a—iT to a+iT. We write symbolically Cy = [a—iT,a+iT]. 
We deform C; to a new contour of integration consisting of the line segments 
Cy = [a - iT], C3 = [a’ —iT, a’ + iT] and Cy = [a’ +iT,a+iT] (see 
Figure 5.1). We ee this new contour by Cg + C3 + en We claim that 


Gi 


1 
(:13) =n ; oe S ma |, PO Ft re — : 


a 2<j<A 


where the rightmost sum runs over all singularities of F(s)/s in Q:= {8s € 
C:a'<o <a, |t| < T}. Indeed, the integrand F'(s)x*/s is meromorphic 
in 2 and analytic in an open neighborhood of the boundary 02. Since 
OQ = C1 — C2 — C3 — Cy when traversed counterclockwise, Cauchy’s residue 
theorem implies that 


1 F(s)x* 


ae ( 
Bra es F(s)—-ds = DT s=u : 


This proves our claim that (6.13 = holds. 
Combining (5.12) and (5.13), we infer that 


F(s)a* 
= f(n y TeSs=w Fi +E+R, 


Nx 


where 
1 x 


F(s)—ds. 


R = Saami 
2n1 Jo, 8 


2<jK<A 
We think of R as an error term because |x*/s| < «7 /|t|, so that the integrand 
F(s)x*/s is small on Cz UC, because |t| = T is large, and it is also small on 


5In general, if C, C’ are two contours with a given orientation, then C + C’ denotes the 
contour that first traces C and then C’ in their respective orientation. Furthermore, —C is the 
contour C’ traced in the opposite orientation. 


Cauchy’s residue theorem and the explicit formula 59 


Figure 5.1. The poles of ¢(s)¢(s—i)¢(s+1/2+22) /s inside the rectangle 
defined by the points +2 +71T. 


C3 because o = a’ < 1. In reality, we also need bounds on F'(s) to estimate 
R. Such estimates can be a bit tricky to obtain outside the region of absolute 
convergence. We will see methods of establishing them in Chapters [6] [8]and 


Assuming that R is indeed negligible, we are led to the guesstimate 


s 

(5.14) S> f(n) © S> thas) we 
N<u w is a pole of F(s)/s 
a’ <Re(w)<a, | Im(w)|<T 

Combining this heuristic with explains why 7(2) should be closely 
approximated by the sum x — 7), <7 x°/p from Theorem The rigorous 
proof of Theorem will be given in Chapter [8] after having developed 
further the theory of the Riemann zeta function (in Chapter [6) and of the 
Perron inversion formula (in Chapter [7]. We will then use the explicit 
formula for w(x) together with a bound on the zeroes of ¢ to establish the 
Prime Number Theorem in Chapter 


We conclude this chapter with some examples that showcase the utility 
and versatility of the ideas presented above. 


Example 5.2. As a toy example, consider the function f = 1. We then 
have that F = ¢, whose only singularity is a simple pole of residue 1 at 
s = 1. Thus, the only singularity of ¢(s)a*/s in the half-plane Re(s) > 0 
is a simple pole of residue x at s = 1. This leads us to the prediction that 


60 5. An explicit formula for counting primes 


Dace 1x 2x. This is of course true, since we know by elementary methods 
that ><, 1 =2+O(1). 


Nx 
Remark 5.3. In general, if F has a simple pole of residue ry at a point w 
that is different than the origin, then 


Pisa - tae 
TeCSs=w — ‘5 
S 


w 
We can generalize this calculation further: if F’ has a pole of order m at 
w #0, then there are coefficients Cy,0, Cw,1, +--+ Cw,m—1 € C such that 


F(s)x* 


TeSs=y = 2" (Cwm—1(log x)" * + Cwym—2(log 2)? + +++ + Cw,0) + 


Indeed, let F'(s)/$ = Quym/($—w)™ +-+++@w,i/(8— Ww) +> j50 bw j(s— w) 
be the Laurent expansion of F'(s)/s about s = w. In addition, we have 
the Taylor series expansion x* = x2" ))i59(s — w)/(logz)//j!. Hence, the 


claimed formula for ress (F(s)x*/s) holds with cy; = dwj+1/J!. 


Example 5.4. Consider the divisor function 7, for which we have the con- 
volution identity 7 = 1* 1. Thus, its Dirichlet series is ¢(s)?, which has a 
meromorphic continuation to C with its only pole being a double pole of 
order 2 at s = 1. In view of relation and Remark [5.3] we are led to 
predict that there are coefficients co,c; € C such that 


S> T(n) © cz log x + coz. 
nKx 
To calculate co and c1, note that ¢(s) = 1/(s — 1) +7 + O(|s — 1]) for 
|s—1| < 1/2 by Exercise [5.2] whereas 1/s = 1—(s—1)+O(|s—1|?). Hence 
Cia)? 1 27-1 


5 j=?” <a" oe 


which implies that cy = 1 and co = 2y—1. This agrees with Theorem 


Example 5.5. Let f be the indicator function of square-full integers (see 
Exercise [1.6). In Exercise [3.11] we saw that the partial sums of f up to x 
have an asymptotic expansion with two main terms, of size «!/? and 21/9, 
respectively. These terms can be guessed using (5.14): the multiplicativity 
of f implies that its Dirichlet series equals 


7 1 1 _ ¢(28)¢(38) 
(5.15) Fe) =I (1+ gat get) = Gees 


for Re(s) > 1. Since ¢ has a meromorphic continuation to C, so does F’. In 
addition, the only singularities of F' in the half-plane Re(s) > 1/6 are simple 


Exercises 61 


poles at the points s = 1/2 and s = 1/3. They both arise from the simple 
pole of ¢ at s=1. Relation (6.14) then leads us to the prediction that 


C(3/2) a/2 | 6(2/3) 1/3 


#{n <u: n square-full } & (3) er 4 CD) 
Exercises 
Exercise 5.1. Prove that 
((s) = = yr Re(s) > 0). 


Exercise 5.2. When 0 < |s — 1| < 1, show the following estimates: 


((s) = 5 +74 Ols-1)); 
log ¢(s) = —log(s 1) +7-(s- 1) +O(ls-1P) (8 ¢ [-1,0); 
¢’ 


(6) =- 5 +7 + Olls— 1). 


Exercise 5.3. Use (5.14) to predict the main term in the asymptotic formulas for 


yee 08 Fee ON); Dee y(n), ings 73(n) and nce T(n)?. Compare your 
prediction with Theorems[L.12]and[3.2] and Exercises[3.8] [3.10]and[4.5] respectively. 


Exercise 5.4* Complete the proof of Theorem B.4{c) as follows: 
(a) Uniformly for x > 2 and « € (0, 1], prove that 


S— log (1 - =) = dog (1 - saa) + O(e). 


peux peu 
(b) Uniformly for x > 2 and « € (0,1), prove that 


do log (1 aaa) = _ — du +0(—— =): 


pra 


[Hint: Taylor’s theorem.| 

(c) Deduce that the constant in (8-6) is K = f>° u~'(e~“ — 1)9,1)(u))du. [Hint: Use 
Exercise [5.2] to rewrite ines ¢(1 + ¢/ log z).] 

(d) Prove that y = f° u-1(1J0,1)(u) — e~“)du. 


[Hint: Note that 7 = limy-,.(—log N + fea +a+---+a%—1)dx) and let 
c£=1-u/N\| 


Chapter 6 


The Riemann zeta 
function 


The explicit formula (5.11) underlines the central role of the Riemann 
zeta function and of its zeroes in the study of prime numbers. We thus 
undertake a careful study of ¢ in this chapter. 


The functional equation 


One of the most fundamental properties of ¢ discovered by Riemann himself 
is that it possesses a symmetry with respect to the vertical line Re(s) = 1/2. 
This symmetry is depicted in the functional equation of ¢: for all s € C, we 
have 


1— 
(6.1) ms?0(=)¢(s) = n-O-s)/ap( = ec = 
where I is Euler’s Gamma function. We can also rewrite (6.1) ad] 


ee u2t(Cl — s)/2) 
T(s/2) 


Using Exercise [1.13] we find two alternative expressions for A: 


(6.3) Ce ae /(I(s) cos(as/2)) = 2°n°4T(1 — s) sin(as/2). 


(6.2) ¢(s) = A(s)¢(1— s), where A(s) = 


As we saw in Chapter the Gamma function is very well understood. 
Hence it suffices to study ¢ in the half-plane Re(s) > 1/2 and then use the 
functional equation to pass to the entire complex plane. 


In the literature, the function is usually denoted by x. Since we have reserved the letter 
x for Dirichlet characters, we use the letter 4, which is the first letter of the Greek word Aéyo¢ 
that means ratio. 


62 


The functional equation 63 


Let us now show (6.1). In the process of doing so, we will see another 
proof of the meromorphic continuation of ¢ to C 


At the heart of the proof of (6.1) lies the Poisson summation formula 
(Theorem [B.3). Let f : R > C be as in Theorem that is to say, 
f € C?(R) and f(x) < 1/x? for |r| > 1 and 7 € {0,1,2}, so that its 


Fourier transform 
a= | sae tras 
R 


satisfies the bound (6) < 1/€? for |€| > 1. Assume further that f is even 
and consider its Mellin transform 


j= ft) ja ae, 


which is well defined for 0 < Re(s) < 2. The change of variables x > na 
implies that 


n *F(s =f f(nx)2°'de. 
Summing this formula over all n > 1 when 1 < Re(s) < 2, we find that 
(6.4) ¢(s)F(s) = i? Ss(z)a* "de with S;(xz) = SS f (nz). 
0 n>1 


Next, we use Poisson’s summation formula (B.3) to deduce that 


2S-1/x 
f(0) +25;(a) = 5° f(na) = ~~ fin/2) = FO) + ea) 


xv 
neZ © EL 


since f and f are even. We then split the range of integration on the right 
side of as (1, +00)U[0, 1]. We also make the change of variables x > 1/x 
to the portion over [0,1]. We thus find that 


¢(s) F(s) = i Sp(ax)a* ‘dex + a Sp(/a)x* “dx 


_ i x gs) eo ie xe _it-a)-1 ee f(0) f(0) 
| Sp (x) de+ [ Sole) iti 


In order to symmetrize the above formula, we choose f(x) = 2e-™" that is 
self-dual, that is to say, f = f. For this choice of f, we have that 


F(s)= 2 | et Sly = xen f e¥ys!?-ldy = 28/71 (8/2), 
0 0 
so that 


1 
3/2 gst (1-s)-1 
r( y= fs S(x jean + fo Sale OT Sei 


64 6. The Riemann zeta function 


wi S(z) = 2 yy e-™*" The right-hand side of the above formula is 
clearly an analytic function for s € C except for simple poles at s = 1 and at 
s = 0. It is also invariant with respect to the change of variables s > 1 — s, 
thus proving the functional equation (6.1). Since [ does not vanish in C 
(see Corollary and has a simple pole at s = 0, we also deduce the 
meromorphic continuation of ¢ to C with its sole singularity being a simple 
pole at s=1. 


The zeroes of ¢ and the Riemann Hypothesis 


Let us now discuss the zeroes of the Riemann zeta function that are inti- 
mately related to the distribution of primes. 


When Re(s) > 1, we know that ¢(s) is given by an absolutely convergent 
Euler product whose factors do not vanish. In particular, ¢(s) 4 0 (see 
Remark [4.7(b)). In addition, Theorem [1.14] implies that [ does not vanish 
at all in C. As a consequence, the left-hand side of the functional aon 
(6.1) is non-zero for Re(s) > 1. Hence, the right-hand side of ( must 
also not vanish in the same region. Equivalently, ¢(s)['(s/2) 4 0 je art iS 
0. However, note that ['(s/2) has simple poles at the points —2,—4,.... 
Hence, ¢ must have simple zeroes at —2,—4,..., and no other zeroes when 
Re(s) < 0. The pole of I at 0 does not induce a zero of ¢, because it is 
counterbalanced by the pole of ¢ at 1. In fact, ¢(0) = —1/2 by Exercise 


As we mentioned in Chapter [5| the zeroes of ¢ at the negative even 
integers are called trivial. All other zeroes lie in the strip 0 < Re(s) < 1 and 
are called non-trivial. We denote them by p = 8+ 77. These are the zeroes 
appearing on the right-hand side of the functional equation (5.11). For this 
reason, the strip 0 < Re(s) < 1 is called the critical strip. 


The functional equation and the obvious symmetry ¢(5) = ¢(s) imply 
that if p is a non-trivial zero of ¢, then so are the numbers fp, 1— and 1—/. 
In his 1859 mémoire, Riemann postulated that all non-trivial zeroes of ¢ lie 
on the line Re(s) = 1/2, which is the line of symmetry of ¢. We thus refer 
to this line as the critical line. Riemann’s conjecture is known today as the 
Riemann Hypothesis. 


The Riemann Hypothesis is a very important conjecture because it offers 
us unparalleled control on the distribution of primes. To explain this claim, 
we go back to the explicit formula (5.11). If p = 1/2 + ‘y for all non-trivial 
zeroes of ¢, then 


gi/2tiy 


T2+H]| < 


sl] S- 
st Tare 


| lyi<v 


The order of magnitude of ¢ 65 


As we will prove later (see Lemma |[8.2{a)), the number of zeroes with |7| € 
[n,n +1] is < log(n 4+ 2), so that 


ll 2tiy ‘ log(n + 2) 
y aa’ /2 S ia <b la ie 1/2 loo? 'T. 
1/2+ iy oe n mee 


ly<t O<n<T 


(6.5) 


Taking T = x establishes the remarkably accurate estimate 
(6.6) w(x) = x + O(/z log? x) 


uniformly for all z > 2. By partial summation (see Exercise [1.7], this is 
equivalent to having that 
(6.7) n(x) = li(x) + O(/2 log z) 
uniformly for all z > 2, where we recall that li(x) = f° dt/ log t. 
Juxtaposing and (0.4), we see that under the Riemann Hypothesis 
the distribution of primes is as close to being “random” as we could hope 
for (up to factors of log a and log log a). In fact, Exercise |[8.2(c) shows that 
we cannot replace /x by a smaller power of x in (6.7), thus making 
“the best of all possible worlds”. 
In contrast, the best known version of the Prime Number Theorem estab- 
lishes with a much weaker error of size x exp(—c(log «)?/> /(log log x)!/°) 
Corollary 8.30]. In Chapter[8] we will show with a remainder term 


of size xexp(—cV/log x) by proving that ¢ does not vanish too close to the 
line Re(s) = 1. 


Remarkably, (6.6) (and hence (6.7)) is equivalent to the Riemann Hy- 
pothesis: indeed, let (x) = « + E(x) and apply (1.12) with a, = A(n), 
f(n) =1/n*, y=1 and z > +o to find that 


a ee a A) ay (Re(s) > 1). 


s—1 

But if E(u) = O(./ulog?u) for u > 2, the right-hand side of the above 
formula is meromorphic for Re(s) > 1/2, thus providing a meromorphic 
continuation of —¢’/¢ to the half-plane Re(s) > 1/2, with the only pole 
located at s = 1. In particular, ¢ does not vanish in this half-plane. By 
the functional equation (6.1), it cannot vanish in the half-plane Re(s) < 1/2 
either, and the Riemann Hypothesis follows. We have thus established: 


Theorem 6.1. The Riemann Hypothesis is true if and only if holds. 


The order of magnitude of ¢ 


In the previous chapter, we gave a rough outline of the proof of the explicit 
formula for w(a). More generally, we saw how to estimate the partial sums 
of an arithmetic function f in terms of the singularities of its Dirichlet series 


66 6. The Riemann zeta function 


F. A crucial technical step that we set aside in this discussion is the need 
to bound F' past its region of absolute convergence. We explain here how 
to do this when F' = ¢. In Chapter [8] we will develop additional tools that 
will also allow us to handle the quotient ¢’/¢ and establish Theorem [5.1] 


So, let us suppose that we are given some s € C. We then want to 
understand the size of ¢(s). The functional equation (6.2) and Exercise [1.12 
imply that 


(6.8) IS(8)| Xe |¢(1— 8) -|e/?-7 (-C < o < 1/2, |e] > 1). 


Hence, it suffices to bound |¢(s)| when Re(s) > 1/2. 


When Re(s) > 1, this is relatively easy: since ¢ is given by an absolutely 
convergent Euler product on this half-plane, we have 


(6.9) 1/C(a) < |C(o + it)| < C2). 


Indeed, this follows by noticing that 1—|z| < |1/(1—z)| < 1/(1—|z|) when 
|z| < 1. In particular, we conclude that |¢(s)| <- 1 foro > 1+¢. We then 
also find by that |¢(s)| ze |t}?-* for.¢ & |—C, =e). 

On the other hand, bounding ¢ inside the critical strip 0 < Re(s) < 1 
is much harder. It turns out we can use the information we have out- 
side the critical strip to extrapolate a bound for ¢ inside it. This uses the 
Phragmén-Lindel6f principle [159| Chapter 12], which is a generalization of 
the maximum modulus principle. 


Theorem 6.2. Let f be a function that is analytic in an open neighborhood 
of the vertical strip ay < Re(s) < ag, and for which there is an absolute 
constant C’ such that f(s) < exp{|t|"} when ar < Re(s) < ag. Assume 
further that f(o; + it) « (1+ |t|)% for j =1,2 and allt ER. 

Given o € [ay, a2], there is a unique u € [0,1] such that o = ua, + (1 — 
u)az. We then have 


flo tit) < (1+ [eyeate-e (te R). 


We postpone the proof of this theorem momentarily because it is a bit 
technical and use it to study ¢. In fact, because of the pole of ¢ at s = 1, we 
work instead with the function f(s) = (s — 1)¢(s) that is entire. Note that 
f grows at most polynomially in |t|, that is to say, f(s) <<¢ (1+ |t|)O™, as 
it can be readily seen by relation (6.8). Since ¢(1 +e+it) <- 1 and ¢(—e+ 
it) <¢ |t|!/2+® for |t] > 1, Theorem 62] implies that ¢(s) « |t|(-?+®)/? for 
—exgo<l+eand |t| >1. 


To summarize the above discussion, we have shown the following result. 


The order of magnitude of ¢ 67 


Theorem 6.3. Fire > 0 andC > 1. Fors =oa+it with o > —C and 
|t| > 1, we have 


1 ifo>l1ite, 
Cs) <e0< |O-F? af exe <1+e 
eam f—C<o<+ 


Motivated by the above theorem, we define 


log |¢(a + it)| 
6.10 (a) = lim sup —-7-___— 
cae, (?) aan log |t| 


for each o € R, that is to say, €(o7) is the smallest number such that 
ICO + #)| Keo EMF (le BD) 


for each fixed ¢ > 0. The discussion in the beginning of the section implies 
that €(0) = 0 for o > 1, that @(0) = 1/2 —o for o < 0 and that 


(0) =1/2-04+4(1-0). 


Furthermore, Theorem|[6.2]implies that ¢(7) is a convex function. In partic- 
ular, it is continuous (see Exercise |6.7), so that €(1) = 0 and (0) = 1/2. It 


is believed that 
0 if 
(a) = kan 
1/2-—o if0 


1/2, 
o <1/2. 


IN \/ 


This is known as the Lindelof hypothesis. 


The convexity of (a) reduces the Lindeléf hypothesis to the case when 
o=1/2. Any improvement of the exponent 1/4 in the estimate ¢(1/2+it) << 
|¢|!/4+© of Theorem [6.3] is called a subconvexity estimate. In turn, this is 
essentially equivalent to proving that the sum Doe . n* is small compared 
to x (i.e., it “exhibits cancellation”) when x is in the vicinity of |t|!/? (see 
formula (7.18)). The current record is £(1/2) < 13/84 = 0.154 due to 
Bourgain [16]. 


Proof of Theorem By a linear change of variables, we may assume 
that ay = 0 and ag = 1, so that our goal is to show that f(o + it) <« 
(1 + |t|/)-%+% In addition, we may assume that 61,02 > 0; otherwise, 
we replace f(s) by f(s)(s +1)* for a large integer k. 

To study f at height t, we consider the function f(z + it) with 0 < 
Re(z) < 1. We further normalize f(z +t) to be bounded on the boundary 
of the strip 0 < Re(z) < 1 by letting 


gz) = f(z tit) /{a of |t|) 0-814 282 (z+ 1)", 


68 6. The Riemann zeta function 


where N = max{[01],/@2|}. Indeed, note that 
|ge(iy)| << (1+ e+ yl) / [C+ fel - + yl] <1 
for y € R, by our assumption that N > 6; > 0. Similarly, |g:(1 + zy)| < 1. 
We have shown that g: is uniformly bounded on the boundary of the 


strip 0 < Re(z) < 1. If we knew its maximum occurred on this boundary, 
the theorem would readily follow, since 


(6.11) f(a + at)| < NL + |e) OW ¥% « |g(o)| (O< 0 <1). 


The main idea of the Phragmén-Lindel6of principle is to construct an aux- 
iliary function which is bounded and whose maximum does occur on the 
boundary of the strip 0 < Re(z) < 1. To this end, we let 


he(2) = exp{e(e™™*/4 + e*/4)}, 
where € > 0 is fixed for the moment. If z = x + iy, then note that 
Re(e%™?/4 + goers) = cos(ma/4)(e~7Y/4 + emy/4) > eTlUl/4 1/2 


for « € [0,1] and y € R. Our assumption that f(z + it) < exp{|t + y|} 
implies that the function g;/h-z is bounded. In fact, we have |(g:/h-)(a + 
iy)| < 1 for all x € [0,1], as long as y is large enough in terms of ¢ and t (we 
suppress the dependence on C, since we consider it fixed). 


Let Y = Y(e,t) be such that |(g:/he)(x + iy)| < 1 for y > Y and 
x € [0,1]. For each T > Y, we consider the rectangle Rr with vertices iT 
and 1+7T. Note that g:/h- is uniformly bounded on the boundary of Rr 
(independently of ¢, t and JT’). The maximum modulus principle implies that 
(gt/he)(z) <1 for all z € Rr. Letting T — oo, we find that g:(z) < he(z) 
for all z in the strip 0 < Re(z) < 1, uniformly in t € R and ce > 0. We then 
let ¢ + 0* to deduce that g:(z) < 1, uniformly in t. Hence, the theorem 
readily follows by (6.11). 


Exercises 


Exercise 6.1. Show that the Riemann Hypothesis is equivalent to knowing that 
Re(p) < 1/2 for all non-trivial zeroes p. 


Exercise 6.2. Show that ¢(0) < 0 when —-2 <a <1. 
Exercise 6.3. Prove that ¢(—n) = (—1)"Bnii/(n +1) for n > 0. [Hint: Exercise 


d).| 
Exercise 6.4. Use (6.1) and Theorem [14] to show that 
¢! ¢! 1 41 1 1 1 
(<2) = l ( 
ae a _ ig re be Ds n 


Conclude that (¢’/¢)(0) = log(27) and ¢’(0) = — log(27) /2. 


Exercises 69 


Exercise 6.5. For t € R, let] 
V(t) = arg I'(1/4 + it/2) — t(log 7) /2. 
Show that Hardy’s function Z(t) := e¢(1/2 + it) is real valued. 


Exercise 6.6. Let 6 € (0,1) be such that >), <, u(n) = O(x ®) for all z > 1. Prove 
that ¢(s) 4 0 for Re(s) > 6. 


Exercise 6.7. Prove that a convex function f : [a,b] > R is continuous. [Hint: 
For each x € [a,b], show that the ratio (f(y) — f(x))/(y — x) is increasing as a 
function of y € [a,b] \ {x}.] 


Exercise 6.8. Prove that the function ¢(c) is non-negative and decreasing. 


Exercise 6.9. Let f(s) be a bounded analytic function in an open neighborhood of 
the strip 0 < Re(s) < 1. If M, =sup,|f(o + it)|, then show that M, < Mj~° MZ. 
[Hint: Consider f(s) = f(s)Mg-'M, */(1 + es).] 


Exercise 6.10. When o; = —1 and o, = 1, show that we may relax the condition 
f(s) < exp{|t|©} in Theorem [6.2] to |f(s)| < exp{Ae?l1}, where A > 0 and 
0< B< 1/2. Furthermore, show that we cannot take B > 1/2. [Hint: Consider 
the function f(s) = exp{cos(7s/2)}.] 


Since I'(s) does not vanish and is analytic for Re(s) > 0, we may define log I'(s) for Re(s) > 0. 
We take the branch that is real valued for s > 0. Then argI'(s) = ImlogI(s), as usual. 


Chapter 7 


The Perron inversion 
formula 


If f is an arithmetic function whose Dirichlet series F’ can be meromor- 
phically continued past its domain of absolute convergence, then we expect 
that the asymptotic behavior of the partial sums of f is determined by the 
singularities of F’, as described in the guesstimate (6.14). The main goal of 
this chapter is to justify this heuristic. 

In our discussion of Perron’s inversion formula in Chapter [5] we ignored 
a subtle technical issue: the choice of the parameter T used to truncate the 
integral Sia) (F'(s)x*/s)ds and approximate it by | one jer (F (s)0*/s)ds. In 
practice, it is important to have a quantitative form of Perron’s inversion 
formula that allows us to choose T as an appropriate function of x. To 
do so, we approach from a slightly different point of view. 


The key observation is that 


lrent (x ; Dyes 
So fn) + Ss =) 6(n/a) with 0(y) = Ipcyca + 
N<ax nel 
The Mellin transform of 6 is . d(y)y®tdy = 1/s, so that 


lrenf (2) a a/n)* 
(7) Say ¢ PE) cane fm) ds 


Nn<x“ n>1 


for each a > 1 (or, even, for a > 0). The next natural step is to interchange 
the order of summation and integration, which would yield (5.9). It is hard 
to justify this step directly because the integrals on the right side of do 
not converge absolutely. We discuss two ways to circumvent this problem. 


70 


7. The Perron inversion formula 71 


The first method: Truncating Perron’s integral. Instead of using the 
exact formula d(y) = (1/277) Say */s)ds with a > 0, we can use the 
following truncated form of it that offers substantial technical flexibility. 


Lemma 7.1. Uniformly for y >0,a>0 andT > 1, we have 


1 y* s= eee + O(y-*/max{1,T|logyl}) ify #1, 


977. | Re(s)=a ; —_ 
ant cater S 1/2 + O(a/T) goal 


We postpone the proof of the above result till the end of the chapter. 
Note that if >7,,5,|f(n)|/n® converges for some a > 0, then Lemma 
implies 


72) Hoy + eT = 5s |stina FUN 88 + OCC" 


nce | Im(s)|<T 


where 


R= ee | 
di mt Te |} 


The error term can be bounded if f does not grow too fast, thus yielding a 
quantitative version of (5.9) as follows. 


Theorem 7.2. Let f be an arithmetic function with Dirichlet series F(s) = 
yr, f(n)/n®. Assume there are constants A,C >0 and 6 > —1 such that 


lf(n)| < Cn°(1+logn)4 = (n EN). 
For x,T >2 anda >64+1+41/logz, we have that 


x x (log x)4+1 = AY. 
S> f(n) aa [ret _ F(s)—ds + O( r + ~~ (log x) Ne 


nSx | Im(s)|<T 


the implied constant depends at most on A,C and 6. 
We prove Theorem[7.2]at the end of the chapter, along with Lemma[Z.1] 


The second method: Using smooth cut-offs. We now discuss an alter- 
native way to obtain a quantitative form of Perron’s inversion formula. The 
underlying cause for the slow decay of the integrand in Perron’s inversion 
formula is that the function 6 is discontinuous. This is a reflection of 
the uncertainty principle in harmonic analysis: the discontinuous function 
6 is too localized on the interval [0,1], so its Mellin transform must have 
relatively heavy tails (that is to say, it cannot decay too fast at infinity). 
In order to get around this issue, we approximate 6 by a more delocalized 
function whose Mellin transform decays faster at infinity. 


72 7. The Perron inversion formula 


| 
| 
| 
0 114i/r 
Figure 7.1. The graph of the function 6r(y). 


A concrete example is provided by the function 


i if0<y<l, 
ér(y)=<T+1-Ty ifl<y<1+41/T, 
0 ify >1+1/T, 


where JT’ > 1 is some large parameter that plays an analogous role to that 
of the truncation point in Lemma[Z.]] (see Figure [Z.1). By construction, we 


have 
LO) = Amjsr(n/x2)+O( YD IF)I). 


n<x n>1 a<nxat+a/T 
We then rewrite 67 in terms of its Mellin transform, which is equal to 
lee) 1 + ioe nil 
5 s—lq i ( 
: v(y)y? dy ss+D/T 
Notice that this is an absolutely integrable function on each line Re(s) = 
a # 0, as opposed to the Mellin transform of 6. In addition, Theorem 
(applied here with a; = 1 and a2 = oo) implies that 


ae (1+1/T)s*! -1 7 
ona) 20 I s(s+1)/T ey nyde 


for any a > 1. As a consequence, 


8 stl _ 


(73) a 20 s (s+1)/T 
| +0( > lre@l). 
a<nxa+ae/T 


Notice that (1 + 1/T)st+ —1 ~ (s+ 1)/T when s = o(T), so that the 
integrands in and in Theorem[7.2]are asymptotically the same for small 
s. In addition, the absolute convergence of the integral in allows us to 
truncate it in a very straightforward way. The larger T is, the better we can 
control the error term )),en<x42/7 |f(m)|, but the worse bounds we have on 
the Mellin transform of dp due to the presence of T~! in the denominator. 
Consequently, we have to choose the parameter J’ in an optimal way that 


7. The Perron inversion formula 73 


balances the gains and the losses. A similar situation arises when using 
Lemma|[?.1]to truncate Perron’s formula. We will see a concrete application 
of this version of Perron’s formula later, in the proof of Theorem [13.2 


Taking the above idea one step further, we approximate the sharp cut-off 
function 6 by a smooth function ¢ € C™(Rso) such that 


oy) =1 if0<y<1, 
(7.4) O<dly) <1 if1<y<141/T, 
o(y) =0 ify >1+1/T. 


Example 7.3. A simple way to construct such a ¢ is to a a smooth 
function g > 0 that is supported on [0,1] and for which fo 9 dz = 1. We 
then set gr(x) = T-g(Tx), which is ao on 1/T] and es integral 


over R also equals 1, and take o(y = fea ytl/ rq w)dw for y > 0. Clearly 
0< oy) < f™ gr(w)dw = 1. Move if ye i 1], then y—1< 0 and 


y+1/T >1/T, so that d(y = fi’7 w)dw = 1. Finally, if y >1+1/T, 
then y —1 > 1/T, so that ae =, aN proves that ¢ satisfies (7.4). 


Notice that for the constructed function ¢ we have 
(7.5) |S lloo Ke T*. 


This is the typical behavior for the /th derivative of functions ¢ satisfying 
(7.4), since they vary by 1 in an interval of length 1/T. 


Given ¢ satisfying (7.4), we consider its Mellin transform 


= | o(y)y® ‘dy 
0 


that converges absolutely for 0 > 0. Integrating by parts & + 1 times, and 
noticing that ¢(*+)) is supported on [1,1 + 1/T], we find 


_4)et 141/T 
s(s = ? -(s+k) | a) (yy dy. 


This provides a meromorphic continuation of ® to the entire complex plane. 
The only potential poles are at the points s = —k for k € Zso of residue 


o0 (k) (Q 
resi-4 (3) =F fo May = © — 


By (74), we have that 4(0) = 1 and ¢)(0) = 0 for k > 1, so that the only 
pole is at s = 0 and its residue equals 1. 


Finally, using (7.6), we find that 
B(s) <<, T! - [s\-*-2 - pF]. 1 + 1m 


(7.6) &(s) = 


74 7. The Perron inversion formula 


for |Im(s)| > 1 and k € Zso. If ¢ satisfies (7.5), then 


Ti _( ap 1/T)ymax{a.0} 


ini 1 1/T max{o,0} = 
( ) (s ) <r ( le / ) oo |s|a+1 |s| max{1,|s|/T}* 


for |t| > 1 and k € Zso. In particular, ®(s) starts decaying extremely fast 
as soon as |s| > 7’, which is in accordance with the uncertainty principle. 


Let us know see how we can use the above discussion to estimate the 
partial sums of f. We start by observing that 


Lim = YL Fon/a)+O( YO lel). 


n<x n>1 a<nxat+a/T 


For any a > 0, Mellin’s inversion formula (Theorem [B.4) implies that 
1 
———— ® 8 
oln/a) = 55 [| ®(5)(a/n)as 


If F'(s) converges absolutely when Re(s) = a, then 


(7.8) Sf (n) on/r) = > 5 a0 we ds 


n>1 


=— F(s)® zs) 
where the change of order of summation and integration is justified by 
Lebesgue’s Dominated Convergence Theorem. 


Similarly to ¢, it is often the case that F' has a meromorphic continuation 
to a half-plane o > ao for some ap < a. In this case, F’ usually satisfies its 
own version of Theorem [6.3} for any fixed o > ao, the function |F'(o + it)| 
is bounded by a suitable power of |t| when |t} + co. On the other hand, 
(Z7) implies that |®(o + it)| grows faster than any fixed power of |t|, so 
that F'(s)®(s) is absolutely integrable on any vertical line Re(s) = a’ with 
a’ > ao. Using Cauchy’s residue theorem, we arrive at the formula 


a F(s)®(s)a°ds = =. F'(s)®(s)x°ds 
27% J (a) 200 J (a!) 
(7.9) 
+ > reS.—w(F'(s)®(s)x°*), 
a’ <Re(w)<a 


where the sum on the last line runs over the singularities of F'(s)®(s). In- 
deed, follows by letting T + oo in (5.13) with ®(s) in place of 1/s. 

Finally, the integral on the right-hand side of is estimated using 
(7.7) and analogues of Theorem [6.3] for F'(s). We give the necessary details 
in the second example below. 


Two examples 75 


Two examples 


We demonstrate how to use the Perron inversion formula in practice by 
employing it to study the partial sums of 7, and to count square-full integers. 


Averaging the divisor functions 7. We wish to estimate the summatory 
function of the kth divisor function 7, where k € Zs. We have that 


which has a meromorphic continuation to C with its only singularity being 
a pole of order k at s = 1. Since (n) < nOv@/leslosn) for n > 3 by 
Exercise 2.9(f), we fix « > 0 and apply Theorem [7.2] with 6 = </2, A= 0 
and a=1+e. We havel+e>60+4+1+41/logz for r> e2/e whence 


1 ae ait log x 
(7.10) tel) = = enh ~ds 4 One ( = ) 


nXe | Im(s)|<T 


uniformly for « > T > 2 and x > e?/*. We will apply (6.13) to replace the 
contour [1 +¢—i7T,1+e¢+iT] by Ci + Cz + C3, where 


C, =[l+e-iT,a'-iT], Cy=[o'-iT,a'+iT], C3 =[o'+iT,1+e+iT] 


for some a’ € (0,1) to be chosen later. All implied constants in what follows 
might depend on a’, € and k. 


Consider the rectangle whose vertices are the points 1+ ¢+7T and 
a’ +iT. The only pole of the integrand in (7.10) inside this rectangle is at 
s = 1 and has order k. Consequently, 


s k 1 8 1+e l 
> f(n) = ress=1 acts) + af C(s)* ds + o(——**). 
eel 8 201 JC,4+C24+C3 8 T 
Remark [5.3] implies that there is a polynomial P, of degree k — 1 and with 
leading coefficient 1/(k — 1)! such that 


ress—1(#°C(s)*/x) = x - P,(log x). 

We treat the integral over Cy + C2 + C3 as an error term and bound it 
crudely. Using Theorem[6.3] we have ¢(s) <- (1+|t|)“-?+®/? for |s—1] >> 1 
and 0 <a <1+e. Since we also have ja’ + it] x 1+ |t| for a’ > 0, we find 

s T 
(s)* “ds = if C(a! + it)* 
8 -T 


Fee 
ro it 


a! + it 


¢ dt 
C2 


r 
<| (1 + jeje teh ae 
_T 


< gh TU-a' te) k/2 


76 7. The Perron inversion formula 


Similarly, we have that 


((s)* =as = | gat) a0 « | OS p-ete)k/21gea 
s 3 s= o—t 74a a Lao 


(7-11) &  tiae PCCeee 
a'<o<l+e 


and the same estimate holds for the integral over C3. Assuming that T < 
22/k the maximum in (711) occurs when o = 1+. 


To conclude, we have proved that 
git€ log x ) 


»~ 7%; (n) = oP, (log t) + Oneal Ca eg ee F 


NKx 


The error term increases when a’ increases. Taking a’ = ¢ yields 


~ T.(n) = «P;,(log x) + Ox. (2° (T*?? + 2/T)logz). 


Nx 


k+2) 


We optimize this estimate by taking T = x?/( Replacing ¢ by ¢/2, we 


arrive at the following result. 


Theorem 7.4. Fiz k € Zs ande > 0. There is a polynomial P;, of degree 
k —1 and of leading coefficient 1/(k — 1)! such that 


> Th(n) = xP, (log x) + Ope rr) (eo 21). 

Nx 
Remark 7.5. Theorem [/.4]improves upon Exercise [3.10] when k > 3, while 
yielding a slightly weaker version of Theorem [3.3] when k = 2. Since in 
its proof we took a’ very close to 0, it is tempting to examine what would 
happen had we chosen a/ < 0. The calculation is a bit different now, since 
\C(a! + it)| <qr |t|'/?-@ for a! < 0 and |t| > 1. It turns out that this idea 
does not lead to an improvement of Theorem [7.4] We leave the verification 
of this claim as an exercise. 


Square-full integers. As in Example[5.5) let f be the characteristic func- 
tion of square-full integers and 


C(3s 
= yp A) _ sents, 


ne1 
Theorem [7.2] with 6 = A = 0 and a= 1+1/logz implies that 
C(2s)¢(3s) x x log x 
iG Ses is go o( T ) 
N<Ku | Im(s)|<T 


uniformly for z > T > 2. This formula puts us right away at a disadvantage: 
the error term should really be of size O(a'/2+*/T), because Va ti) = 
x, that is to say, the parameter 0 is —1/2 on average. We could prove a 


Two examples (es 


more general version of Theorem[7.2]that would allow such an improvement, 
but it is significantly simpler to work instead with a smooth cut-off ¢. 

Let T > 2 be a parameter that we will choose later. By Example [7.3] 
there are functions d+ € C™(Rso) such that 


loa-1/7] < & <1pay < ¢* < loadin, 
and |\(¢*)||5o <p T* for each fixed k. Then 
(7.12) do F(m)o(n/2) < D7 Fm) < DF F(n)o* (n/a). 
nel ner n>1 


Let ¢ € {¢ ,¢*}, and let ® denote its Mellin transform, which satisfies 
(7.7). We use relations (7.8) and (79) to find that 


(3/2) (1/2) 4/2, 62/3) 81/3), 
a dS) ln/a) = “ay gt tay 3 
42 f LNB) gr sras 


2774 (a’) ¢(6s) 
for any a’ € (1/6,1/3). To ease notation, let a’ be a from now on. 


To estimate the integral over the line Re(s) = a, we use Theorem [6.3] to 
find that 


C(2s)C(3s) cam (1 i eye eC 4. EW eam ca _ (1 re ye eee 
for any fixed ¢ € (0,1/2]. Finally, since Re(6s) = 6a > 1, we have |¢(6s)| > 
1/¢(6a) Sq 1 from relation (6.9). Together with (77), this implies that 

¢(28)¢(38) (1+ [é|) Sa/? #000 
¢(6s) max{1, |¢|//T}* 
for any fixed k > 0. We use the above inequality to bound the integral in 
(7.13): we have 


®(s)a* <e,a,k 


2 T 
/ _ ¢( SEKS8) Bears <| (1+ lal Bereta ae Fs Wd aaa a 
ae Ae 7 


since we have assumed that a < 1/3. On the other hand, 


—5a/2+e€,.a 
} eS) alanis < | [¢| x dt < Pi —ba/2te go 
Re(s}=a ¢ \t]>T 


2 
Bets)-a C(65) (a7) 
assuming that ¢ < 1/2. Inserting these bounds into , we conclude that 
= Hh (8/2) ®C/2) aye, c/s) 21/3) 4) 
ss ¢(3) 2 ¢(2) 3 


4 O(a Toate), 


78 7. The Perron inversion formula 


Finally, since lig 4-1/7) < ¢ < 1poi41/7}, we have that 
1 
o(s)= | y® ‘ds + O(1/T) = 1/s + O(1/T) 
0 
for s € {1/2,1/3}, so that 


l/2 
2 fe) on/2) - ae 4 at 2f, o( 4 ane ee) 


for ¢ = o*. We optimize the error term by taking a = 1/6 + ¢/2 and 
T = x*/', Together with (712) this implies the estimate 


> Fln) a te x ee aN + Ont S846), 


This recovers the main terms of Exercise|3.11| but has a worse error term. 


Nx 


We thus see that even though using Perron’s inversion formula offers 
an intuitive way of establishing asymptotic formulas, it is sometimes possi- 
ble to prove superior results using more elementary methods. Hence, it is 
important to be fluent in both ways of approaching a problem. 


Truncating Perron’s integral 

We conclude the chapter by proving Lemma [7.1] and Theorem [7.2] 

Proof of Lemma [7.1] First, we consider the case 0 < y < 1. We fixa 

large A > 0 and apply (6.13) with F(s) = 1 and a’ = —A to find that 

1 (1/y)* 
s 


Oni Re(s)=a 
[Im(s)|<T 


1 
(7.14) ds =1+—— on aj i-1 + fo + hi), 


where J_ is the integral of y~*/s over the line segment [a—iT, —A—iT], Ip 
is over [—A—iT, —A+iT] and J; is over [-A+iT,a+iT]. What is important 
in the above formula is that the integrand is very small on the new contour 
of integration: either the denominator is large (in [41, that are supported 
on the horizontal line segments [-A+iT,a+iT]), or the numerator is small 
(in Jp, that is supported on the vertical line segment [—A — iT, —A + iT]) 
because y > 1. More concretely, 


—A+iT (1/y)* Qa oe y° 
Ii = d </fa aca f ve 
= | ar 8 Sea lol < Thlogal Tegal 


whereas Ae 
“a 
lo= f tect dt. 
-AiT 8 A+ Ti 


Letting A — oo, we have Ig + 0. This proves the lemma when y < e 


=e 


Truncating Perron’s integral 79 


When e-/7 < y < 1, we use a variation of (6.13): we replace the 
line of integration L = {s € C: 0 =a, |t| < T} by the circular arc 
C={s EC: |s| = Va?+T?, 0 < a} traversed clockwise. As in (7.14), 
Cauchy’s theorem implies that [, (y~*/s)ds = Jo(y~*/s)ds+27i. To aes 
the integral over C, we note that |y~*/s| < y°/T for s € C. Since the 
length of C is = T, the lemma follows in this case too. 

The case y > 1 is similar. However, instead of shifting the contour to the 
left, we shift it to the right, so that y~*/s becomes smaller in magnitude. 
No pole is encountered this time. We leave the details as an exercise. 


It remains to handle the case y = 1. We argue by direct computation: 
oy es es" jae fr dt 
Qri J Re(s)=0 gs =n Jo \atit a—it/ awry o%4 87° 
| Im(s)|<T 


The rightmost integral equals arctan(T'/a)/a = 1/2 + O(a/T), which com- 
pletes the proof of the lemma. 


Proof of Theorem [7.2] Using the fact that f(n) < Cn®(1+ logn)4 and 
(7.2), we have 


v* Qa, 6 A 
», ie ~ Oni =e F(s) a ds+O(a*+ E+ 2" (logz)”), 


nga | Im(s)|<T 


where 


(1+ logn)4 (1 + log n) 
ae y no wut T|log =|} ~ 2 4 nitt/loge max{1, T| log =|} 


We write EF = £, + Ey + E3 + Ey, where FE} is the part of the sum with 
|c—n| <1, Eo is with 1 < |x—n| < «/T, Es is with max{1,2/T} < |z—-n| < 
xz/2 and Eg is with |x — n| > «7/2. 

We clearly have that E, < (logx)4/x. The sum £2 has non-empty 
range only when x > T,, in which case we have 


(1+logn)4 — x (logx)4 


< = 
Fy < mit1/logx ve r 


z—2/T<n<a+a/T 


For the terms in the range of £3, we note that |log(a/n)| = |n — 2|/x, by 
Taylor’s expansion of the logarithm about 1, so that 


(log x)4 (log x)4 

EF ————_ < ——— 

2 » eer ye 
max{1,z2/T}<|n—a|<a/2 1<23 <ax/2 23<|x—n|<2I9+1 


Since there are < 2) integers n with 27 < |x — n| < 2/+1, we deduce that 
E3 « (logx)4+!/T. Finally, for the terms in the range of E4 we note that 


80 7. The Perron inversion formula 


| log(x/n)| >> 1, whence 


A+1 


(1 +logn)4 (log x) 
Ey < Da, mit1/log x <A T 


by an application of the Euler-Maclaurin summation formula (Theorem 
1.10). (Alternatively, note that the contribution of n € [x/,aI*1) to Ey 
is < T~!(logx)4+1(j + 1)4e-9. Summing over all j >0 proves the claimed 
bound.) Putting together the above estimates implies that E< (log x)4+!/T 
+ (log x)4/x, thus completing the proof of the theorem. 


Exercises 


Exercise 7.1. Consider an arithmetic function f with Dirichlet series F'(s) con- 
verging absolutely for Re(s) = a > 0. Prove that 


=f C4 Jay = Jo f(n)(1—n/2) = mf Og” 


n<y N<kx 


and 
oe f(n =) f(n) log(a/n) = =n F(s) sds. 
Qri 8? 
n<y NXgx (a) 


[Hint: Mellin inversion for ¢(y) = ly<1-(1— y) and w(y) = ly<i - log(1/y).] 


Exercise 7.2. Let ¢ € C™(Rso) be compactly supported, and let ® denote its 
Mellin transform. 


(a) Show that ®(s) is analytic for Re(s) > 0. 


(b) If (y) = ¢0 when y € [0,1], show that © has a meromorphic continuation to 
C whose only singularity is a simple pole of residue ¢o at s = 0. 


(c) If supp(¢) C [0,m], then prove that ®(s) <g4 mm /(1 + |s|)4 for all 
s € Cand any fixed A> 1. 


(d) Let f be an arithmetic function with Dirichlet series F'(s) converging absolutely 
for Re(s) =a > 0. For x > 1, prove that 


aS f(n ae =e F'(s)®(s)a*ds. 


n21 


Exercise 7.3. Let f(n) = ?(n)/p(n) and let F(s) be its Dirichlet series. 


(a) Prove that F(s) = ¢(s+1)G(s), where G(s) is a Dirichlet series that converges 
absolutely for Re(s) > —1/2. 


(b) Write G as an Euler product and calculate its logarithmic derivative G’/G. 
Deduce that G(0) = 1 and G’(0) = D7, (log p)/(p? — p). 


Exercises 81 


(c) For 1 <T <a and a€é (—1/2,0), prove that 


D1) = 5 frcay-t/roge PAs + O((log2)?/7) 


NRa | tn(s)|<T 


1 s 
=logn +9 +600) += | F(s)—ds + O(log x)?/T), 
2Qri Jo 8 


where C is the sum of the contours [1/log x —iT,a—iT], [a—1T,a+iT] and 
[a +iT,1/logx + iT). 
(d) Use Theorem [6.3] to estimate the integral over C’ and deduce that 
Yo fm) = loge +7 + G(0) + O-(0-7/***) (@ 1) 


nKx 
for any fixed e > 0. 


Exercise 7.4. Use Theorem and its variants to estimate Dee log n, 


a s pi? (n) and ince Y(n). Compare your results with those obtained by Theo- 
rem [1.12] Exercise B.8] and Theorem [3.2] respectively. 


Exercise 7.5. An integer n is called cube-free if there is no prime p such that p*|n. 
On the other hand, an integer is called cube-full if p?|n whenever p|n. Estimate the 
number of cube-free and cube-full integers in [1, a]. 


Exercise 7.6. Show that there is a linear polynomial L, a quadratic polynomial 
Q and some 6 > 0 such that 


S- gu(n) =7- L(log x) +4 O(a~°) (a = 1) 
nN<u 


and 


5 2°) =2-Ologs)+O(@"*) (#21). 


Nu 


Exercise 7.77 Let s =o + it with 0<o < 1 and |t| > 
(a) For « >T >2 anda=1-—0+41/logz, show that 


1 1 x x? logs 


n<Xxa | Im(z)|<T 


(b) If | Im(z)| < |t|/2 and € > 0, then show that 


eae Bee late if —o0 < Re(z) < 
+2)<& 
¢(s+2) <e [6)t/2-e Bele) +6 if Re(z) < -o. 


(c) Let ¢ > 0 and A > 1 be fixed. Uniformly for x > |t|, prove that 


(718) SCs) + One ((tl?-74*(H1/al)4 +2 HE), 


nxn 


82 7. The Perron inversion formula 


Exercise 7.87 Let ¢ be as in Exercise [Z.2(b) with do = 1. 


(a) Consider s € C with 0 < o < 1 and |t| > 100, as well as x,y > 1 with 
xy = |t|/27. Show that 


YM) ore tcot sf slot 2)8)2*de, 
(-3) 


271 


(b) Use (6.2) to write ¢(s + z) = A(s + z)¢(1 — s — z) and deduce the approximate 
functional equation 


(7.16) 6(s) = So SO) 4S SOND 9g g(t), 
where : i : 
02 (u) = 265 ®(z)A(s + z)(ult|/27)*dz. 
(—3) 


(c) Show that is meromorphic with its poles located at the odd positive integers 
with resz-on41 A(z) = (—1)"~122"*1 772" /(2n)!. Finally, use Exercise [12] to 
show that \(a+ib) <, max{|b|, 1}1/2-¢ when a,b € R are such that |a+ib—k| > 
1/2 for k = 1,3,5,.... 

(d) Let a € (—oo, 3/2] \ {0,1 — o}. Prove that 

1 
oe. (u) = -— ®(z)A(s + z)(ult|/27)* dz 
271 (a) 
+ laso* A(s) — lasi-o * 28(1 — s)(ult|/2m)*~*. 
(e) Fix e > 0 and B > 0. Show that 


ur if u > |t\®, 
[eee ress? if0<u< lel. 


5 (U) <e,B 


[Hint: When wu > |t|®, take a = —C in part (d) with C big enough in terms of 
e and B. Otherwise, take a = 3/2.] 

(f) Combine parts (b) and (e) to prove that ¢(s) «z |t|"-7)/2+* for |t| > 1 and 
0 <o <1, thus recovering Theorem [6.3] inside the critical strip. 

Exercise 7.97 

(a) Let z = a+ ib with a,b € R such that |a| < |b|?/3 and |z — k| > 1/2 for 
k =1,3,5,.... Show that 

IA) (2m) max |b, 22-4, 

[Hint: Use the relations \(z) = \(Z) and A(z) = 1/A(1 — z) to first reduce to 
the case when b > 0 and a > 1/2. When a > 1/2 and b > 1, use Exercise 
[1.12] noticing that |z/2|* = (b/2)*(1+ O((a/b)”))* and arg(z/2) = 7/2—a/b+ 
O((a/b)?) for a < b. Similar estimates also hold for |(1 — z)/2|* and arg((1 — 
z)/2).| 

(b) For s € C with 0 < o < 1 and |¢| > 100, and for z,y > 1 with xy = |t|/2z7, 
show that 


(717) (8) = D> Ao) Sy ay + Ool(a-? + [ely H)IEN. 


nxn n<y 


Exercises 83 


[Hint: Consider ¢ € C°(Rso) with Mellin transform ®. Assume (7.4) and 
(7.7) with T = |t|\/2-©. For z = a+ ib with |z| > |t\|!/? > |a| and |d| < |t|/2, 
show that 

B(z)A(s + z)(|t|/2m)* ea |t|A]1 + b/t| "(1 + A/T Mord, 
When wu < 1— 2/T, use this estimate to show that 


83) = 5) — Ff popjasyn PMG + 2)(ultl/2n)Fde + Oal ltl) 


| Im(s)|<|¢]"/? 


1 z —A 
= No) 55 fesayainia BEING + 2)(ultl/2n)*d2 + Olt), 
| Im(s)|<|¢]*/? 
since (z)A(s + z) has no poles when Re(z) > 3/2 and |Im(z)| < |t|!/?. Con- 
clude that ¢*(u) = \(s) + O4(|t|-4). On the other hand, when u > 1 +2/T, 
show that ¢*(u) <4 u~1!°\t|-4 by moving the contour to the line Re(z) = 
—|e\"?] 


Remark 7.6. When 0 < Re(s) < 1, formula allows us to approximate 
¢(s) accurately by a sum of |t|!*© terms. On the other hand, taking x = y = 
\/|t|/2m in (2.17), we can write ¢(s) as a linear combination of two much 
shorter sums, each of length \/|t|/27. In particular, 


: il ss 1 7 
¢(1/2 — it) = S- ar +e 2iv(t) > re ale O- (|| Ler) 
N<y/|t|/2a n<v/|t|/20 
(7.18) — ¢-t(t) 3 2 cos((t) — tlogn) O-(|t[-¥/442), 


n<v/|t|/20 ve 
where ¥(t) is defined in Exercise For this reason, formula has 
significant applications. On a theoretical level, it is very useful when study- 
ing the value distribution and the moments of ¢. On a practical level, it 
allows us to calculate ¢ fast inside the critical strip. Indeed, a variation of 
was used by Riemann himself to calculate the first few non-trivial 
zeroes of ¢ and verify they are on the line Re(s) = 1/2. Riemann’s exact 
variation of was rediscovered by Siegel when he was studying 
Riemann’s handwritten notes at the University of Gottingen and it is 
known today as the Riemann-Siegel formula. For a detailed discussion of 
this subject, see Chapter 7 of Edward’s book on ¢ [87]. 


Chapter 8 


The Prime Number 
Theorem 


Having developed the theory of the Riemann zeta function and of the 
Perron inversion, we use them to establish a quantitative version of the 
celebrated Prime Number Theorem. 


Theorem 8.1. There is a constant c > 0 such that 
n(x) = li(x) + O(xe~% ver) (a> 2). 


Instead of working with m(a), we work with Chebyshev’s function 7(x) = 
ence A(m). Our first goal is to establish the explicit formula (6.11). Sub- 
sequently, we will show that ¢(s) 4 0 when Re(s) © 1. This will allow us to 
bound the sum over zeroes in (5.11) and obtain Theorem 


Proving the explicit formula 


In order to use the techniques of the previous chapter, we need to control 
¢’/¢ past its domain of absolute convergence. The key technical estimate is 
the following lemma, whose proof we postpone till the end of the chapter. 


Lemma 8.2. Lets=o+it EC. 


(a) There are < log(|t|+2) non-trivial zeroes p = B+iy of ¢ with |y—-t| < 1, 
even when counted with multiplicity. 


(b) If |s+2n| > 1/2 for alln EN, then 


YS + Ollog(|s| +2)); 


ly-t|<1 


the sum runs over non-trivial zeroes of ¢ listed with their multiplicity. 


84 


Proving the explicit formula 85 


Proof of Theorem Let x,T' > 2. For technical reasons, we first prove 
the theorem when |T'—7| >> 1/logT for all p. Our starting point is Theorem 
which yields the formula 


1 / s ] 2 
(8.1) WO=35 faeries ( <(s))* ds 4 o(7 ee x +1og:r) 


| Im(s)|<T 


Next, we replace the contour of integration [1+ 1/logx—iT,1+1/log x—iT] 
by the contour D_; + Lo + I1, where we have set D_; = [1+ 1/logx — 
(P=IN =1=71), lo=(-2N = 147, =2N =1 77) and 1 SN = 
14+ %7i7,1+1/logx + iT] for a fixed large integer N > 1. Indeed, relation 
with a =1+41/logz and a! = —2N — 1 implies that 


v(x) = S° TES sy ae, 


—2N-—1<Re(w)<1+1/ log « 


i 


] T 
+0( HEED 4 tog2), 


where the sum over w runs over all poles of f(s) := (—¢’/¢)(s)#*/s in the 
rectangle formed by the points 1+ 1/loga +iT and —2N —1+iT. Our 
next task is to locate all such poles. 


The pole of ¢ at s = 1 induces a pole of f of residue x. Moreover, for 
each zero p of ¢ of multiplicity m,, we obtain a pole of residue —m,x?/p 
(see relation and the discussion preceding it). Finally, there is a pole 
at s = 0 of residue —log(27), and poles at s = —2n > —2N of residue 
xz ?"/(2n) <4-". Therefore 


eae 0 es er 


(8.2) XP 
2 (aT 
Plame ’ +1082), 


where each zero p is repeated several times according to its multiplicity. 


Next, we bound the contribution of the integrals over D_; and L;. On 
these integrals we have |Im(s)| = T. Recall that we have assumed that 
|T — y| >> 1/logT for all y. If G + 7y is a zero of ¢, so is 8 — iy, and we 
deduce that |T + 7| >> 1/logT. We thus find that |s — p| >> 1/logT for 
s € L4, with o > —1. Together with Lemma|8-2| this implies that 

/ 
“(s) < log T + Ss logT < log? T 
ly—-t|<1 


86 8. The Prime Number Theorem 


for all s € D4, with o > —1. When o < —1, we have the stronger bound 
(¢’/¢)(s) < logT since the distance of s to the zeroes of ¢ is > 1. Conse- 
quently, 


¢! xe pee (log T)2x° [. (log Te 
ds< do + a, 
iz ra s)— 1 T ~2N-1 iy 


a log?(xT) 


When s € Lo, we have |s + 2n| > 1 for all n € N and |s + p| > 2N +1 for 
all p. Hence, (¢’/¢)(s) < log|s|, which implies that 


Ee  (log(N +[é)))2-28—t 
(Za [7 Costu 
igh Sl _T N+I(t| 
Inserting these estimates into (8.2) and letting N — oo completes the proof 
of Theorem 6.1] when |T — y| >> 1/logT for all zeroes p = 6 + ty of ¢. 


Finally, consider the general case. There are < log 7 zeroes of ¢ in the 
horizontal strip {T < Im(s) < T+ 1}. By the pigeonhole principle, there is 
T’ € (T,T +1 such that |T’ — y| > 1/logT” for all zeroes. We then have 


p l 2 qT! 
Wola e— Yo 54 0( Na + toe). 
lyl<T" 
In addition, Lemma[8.2{b) implies that 


| 3 a Es a 


r<hicr’ ? nec 
This proves Theorem [5.1] for all T > 2 


A zero-free region and the Prime Number Theorem 


In view of Theorem [5.1] the only ingredient missing from proving the Prime 
Number Theorem is showing that the terms x°/p are small compared to the 
expected main term x. Since |x?| = x°, we need to prove that § is not too 
close to 1, namely that a certain region is free of zeroes of ¢. This is precisely 
the context of the next theorem. 


Theorem 8.3. There is a constant c > 0 such that ¢(s) #0 in the region 
c 


o >1- ———_.. 
log(|t| + 2) 
Proof. Let pp = 89 + 770 be a non-trivial zero of ¢. We need to prove that 


Cc 


A zero-free region and the Prime Number Theorem 87 


First of all, since ¢ has a pole at 1, there is an absolute constant 6 € (0, 1] 
such that |p9 — 1| > 6. In particular, if |yo| < 6/2, then 1 — fo > 6/2 > 
6/(4log 2), so that follows in this case provided that c < 6/4, as we 
may certainly assume. For the rest of the proof, we assume that |yo| > 6/2. 


In order to explain the idea of the proof, we consider first the extreme 
case when (9 = 1, i-e., C(1 +779) = 0. By the analyticity of ¢, we must have 
that ¢(o +770) ~ a: (o — 1) for some a € C as o > 1. On the other hand, 
we have 1/¢(o + iy0) = [],(1 — 1/p?**%), and the only way this product 
can tend to infinity as ¢ > 17 is if p’ points towards —1 a lot of the time. 
But then p?% = (p)? should point often towards 1, thus implying that 
C(o + 2t70) = TI, - 1/p7*?*%)-1 5 00 as o 4 11, that is to say, ¢ should 
have a pole at 1+ 2779. This is impossible, since yo ~ 0 here, and the only 
pole of ¢ is at 1. 


We formalize the above idea by introducing a family of metrics D,(.,-), 
o > 1, on the set of multiplicative functions taking values in the unit circle 
that we define by 


— my) __ my |2 O 
(8.4) (ha=5> >: lf (p™) = )| logp 


p m=1 


We think of o as a parameter that will be optimized later in terms of 7. 


By the triangle inequality, 


o(,n*™) = a(n? ni) 
(8.5) < Do(n-™, u(n)) + Do(u(n), nr") 
= 2D,((n),n%). 


The above inequality is a rigorous way to see that if p’! ~ —1 on average, 
then p7 ~ 1. We will prove though that D,(1,n?") cannot be too small 
because ¢ is analytic around 1+ 2779, whereas a zero pp = 89 +770 too close 
to 1+ would make D,((n),n'°) rather small. 


We start by proving a lower bound on D,(1,n?’%). Since |1 — p”*|? = 
2(1 — Re(p~“*)) for t € R, we have that 


Sao (1—Re(p-?™))logp ¢’ 
gi har) pa pra = (2) + Re ( 


(o+ 2i70)). 


p,m 


We evaluate the last term on the right side using Lemma [8.2{b). Since 
lyo| > 6/2, we have |s + 2iy9 — 1| > 1. In addition, we have 


1 _ o—B 
os reser ~ @=£2 $Q%=7)7 = 


88 8. The Prime Number Theorem 


for each p = 6 +77. As a consequence, 
¢! 
Re (%(6 + 2im0)) > —O(log( || + 2) 
Since we also have that (—¢’/¢)(a) = 1/(a — 1) + O(1) , we conclude that 


1 
o(1,n%%)? > aa ines O(log(|yo| + 2)). 


Next, we deal with D,((n),n’). Similarly to before, we have that 


oe, 6 Cy) 
o(u(n)ni)? = —=(0) — Re ((e +0) + O(1) 
_ tt _ ao—B : 
onl , (o — 8B)? + (40-7)? + O(log(|¥o| + 2)) 
<= ay + Ollos(lrol +2) 


by dropping all the summands except for the one with p = po. 
Combining the above estimates, we find that 
1 4 4 
< 
o—-1 o-1 o- fo 
If o =1+1/(Clog(|yo| + 2)) for some large enough C’,, we have 
3.5 4 o-1 1 
> , whence 1 > = . 
eal Fe Po > —>— = FGToa(yol 4D) 


Taking c = min{d/4,1/(7C)} completes the proof of the theorem. 


+ O(log(|yol + 2). 


Remark 8.4. The above proof recasts a classical argument due to Mertens. 
The original proof has as its starting point the relation 


(8.6) 3+ 4cos6 + cos(20) = 2(1 + cos 6)? > 0, 


often called the 3-4-1 inequality. Setting 0 = tmlogp and multiplying the 
above inequality by p~'? log p and summing it over all p and m yields (8.5). 
The proof of (8.5) we presented is due to Granville and Soundararajan 
[74] and fits within the framework of the theory of pretentious multiplicative 
functions. A full account of this theory is given in [75]. In addition, elements 
of it can be found in Chapters [I3} and 27] of this book. 


We are finally ready to prove the Prime Number Theorem. 


Proof of Theorem [8.1} We will estimate w(x) instead; passing to 7(z) 
can be easily accomplished by partial summation (see Exercise [1.7(d)). 


A bit of complex analysis 89 


Combining the explicit formula (6.11) with Theorem we find that 
there is an absolute constant c; > 0 such that 


gi-a/logT x log? x 
ve) =2+0( a ) 


lyi<v 


for any T € [2,2]. Moreover, since ¢(0) 4 0, we have that |p| >> 1 for 
all non-trivial zeroes of ¢. In particular, we have the estimate |p| < 1+ |7]. 
Finally, arguing as in (6.5), we find that }°),)<71/(1+|7|) « log? T. Putting 
everything together, we conclude that 


w(a)=a2+ O(x'~1/ 187 (log T)? + x(log ay /T 
Taking T = eV'°8* completes the proof. 


A bit of complex analysis 


We conclude the chapter with the promised proof of Lemma|8.2] The start- 
ing point is a variation of the classical Borel-Carathéodory theorem. 


Lemma 8.5. Consider a function f(z) = >>.) cnz" that is analytic in the 
disk[ \z| << R with f(0) =0. If Re(f(z)) < M whenever |z| = R, then 


8 
Rr for nEé€Zs. 
Furthermore, for k € Zyo and |z| < (1—e)R withO<¢e< 1, we have 


8k!M 
k 
[f' Mz |< ek+1 RE’ 


Fen S 


Proof. Since f is analytic in the closed disk |z| < R, a compactness argu- 
ment implies that it is also analytic in an open disk |z| < R’ with R’ > R. In 
particular, its Taylor series )>°° 9 cnz" converges absolutely when |z| = R 


Note that co = f(0) = 0. Write cn = an + ibn so that 


Re(f(Re’’)) =SoR" Ap, cos(n@) — a bp, sin(n@). 
n=0 


Fourier inversion (or Cauchy’s residue ie then implies that 
1 27 ; 

(8.7) Ray == | Re( f(Re'®)) cos(n8)dé 
T JO 


for n € Zso. In particular, fa Re(f(Re*”))d@ = ag = 0 and 


20 
/ |Re( (Re) |a9 


|an| < 


Rr 


1This means that f is analytic in an open neighborhood of the disk {z € C: |z| < R}. 


90 8. The Prime Number Theorem 


for n € Zs1. Hence 


20 
lonl < Frc ff (Rel sRe®))| + Rel f(Re™)))ao 
20 ; 4M 
= a max{Re(f(Re’’)), 0}dd < Rr 


A similar argument implies the same bound for |b,,| and the claimed bound 
on |cn| follows. For the bound on f“*)(z), we simply note that |f)(z)| < 
nse Un —1)---(n—k +1)|e_|((1 — e)R)”* for |z| < (1 —)R, and then 
insert our bound for c,. 


If f(z) = log g(z) with g(z) £ 0 on some disk |z| < R, then Re(f(z)) = 
log |g(z)|. Lemma 8.5] then allows us to translate an upper bound on |g(z) 
to bounds on f and its derivatives. This leads us to the following lemma, 
which is a generalization of the fact that a polynomial of degree d and of 
bounded coefficients grows roughly like e? in the unit disk |z| < 1, while 
also having at most d roots there. Part (a) is due to Landau, and part (b) 
is a weak quantitative form of Jensen’s formula from complex analysis. 


Lemma 8.6. Assume that g(z) is analytic in the disk |z| < 4R with g(0) 4 
0, and let z1,..., 2% be its zeroes in the disk |z| < 2R listed with multiplicity. 


(a) If M > 0 is such that |g(z)| < e“ -|g(0)| when |z| = 4R, then 


< — for |2z| <BR. 


(b) If M’ > 0 is such that |g(z)| < e™’ - |g(0)| when |z| = 2R, then 
(8.8) Hii<l< ki la|< Rk} 2M’. 


Proof. (a) The function G(z) = g(z)/ Te (2 — 2) is analytic for |z| < 4R 
and non-zero for |z| < 2R. Thus f(z) := log(G(z)/G(0)) is analytic in 
the disk |z| < 2R. It also vanishes at the origin. The maximum modulus 
principle implies that 


ees IG(z)/G(0)| < ae eae) =e 


since |z~| < 2R < |z— z| on the circle |z| = 4R. We then bound f’(z) using 
Lemma\[8.5]to complete the proof of part (a). 


(b) We could use Jensen’s formula to prove the second part. Instead, we 
give a direct proof following Lemma 6.1). 


A bit of complex analysis 91 


Consider the auxiliary function 


k 
2R— 22;/2R 
h(z) = g(2) |] ———— 
z— zg 
é=1 
that is analytic for |z| < 2R. In addition, |h(z)| = |g(z)| for |z| = 2R, since 
2/2R = 2R/z for such z and thus |2R — z2Z,/2R| = |z — z|. The maximum 
modulus principle then implies that 


k 

OR L 

max. la(2)| > 1h(0)| = 190) - T] Fy > Io! gH Astshilzel SR} 
7 @=1 


and the proof is complete. 


Proof of Lemma Any set of zeroes of ¢ we consider in this proof is 
implicitly a multiset with each zero listed as many times as its multiplicity. 


(a) Notice that ¢(s) < 1+ |t| for |s —1] > 1 and o € [1/2,5] by 
(5.7). Moreover, relation with k = 3 implies that ¢(s) < 1+ |¢|? 
for |s —1| > 1 and o € [-3/2,1/2]. We then apply Lemma [8.6{b) with 
g(z) = ¢(2+ it + z)(14+it+z) and R = 3. We note that |g(z)/g(0)| = 
O(1 + |t|*) for |z| < 4R, since 1/¢(2 + it) = O(1) by (G9). Therefore, if 
A={p:|p—2-it| < 3}, then LemmalB.6{b) implies that |A| < log(2+|t]). 
Since all zeroes with |y — t| < 1 are in A, part (a) of the lemma follows. 


(b) When o > 2, the result is trivially true, since (¢’/¢)(s) = O(1) for 
such s, as well as |s — p| > 1 for all zeroes with |t — y| < 1 (and there are 
O(log(|t| + 2)) such zeroes). 

Next, assume that —1 < o < 2, so that |s — 2 —it| < 3. Let A’ ={p: 
|p —2—it] <6} and A” ={p: |t—y| <1}. Lemma[.6[a) (applied again 
to g(z) =¢(2+ it + z)(1+ it+ z) with R = 3) implies that 


(8) _ > | < log(2 + |t|). 


Note that |s — p| > 1 when p € A’ \ A”, and there are < |A’| < log(2 + |t]) 


such zeroes. This completes the proof of part (b) when o € [—1, 2]. 


Finally, assume that 0 < —1 and |s + 2n| > 1/2 for all n € N. Then 
we have | cot(m7s/2)| < 1. Thus, the functional equation (6.2) and Exercise 
b) imply that 


(8.9) (¢'/¢)(s) = —(0/¢)( — s) — log|1 — s| + O(1) = O(log |s}). 
This completes the proof of Lemma in all cases. 


92 8. The Prime Number Theorem 


Exercises 


Exercise 8.1. Let 2,7 >2and7€ER. 
(a) Adapt the proof of Theorem [5.1] to prove that 


gittt ge tir x log*(x(|\r| + T)) 
A wT _ _ f } . 
Aw =e p+ir o( T oe c) 


NX lyt+r|<T 


(b) Assuming the Riemann Hypothesis, prove that 


: gitir 5 
Yo A(njni* = = + O(a l0g"(e + |r). 
ar + IT 
Exercise 8.2. 
(a) Show that if N € Zs is not a prime power, then #(N) = N-Viy<n2 N?/p- 
log(27) + on-400(1). [Hint: First, prove a version of Theorem [7.2] with a better 
error term. (Solution in [31] Chapter 17].)] 


(b) Show that ¢ must have infinitely many non-trivial zeroes. 

(c) For each ¢ > 0, show that there is at least one non-trivial zero p with Re(p) > 
1/2—e. Conclude that we cannot have a(x) = a + O(a/?-*) for all a > 1. 

Exercise 8.3. 


(a) Consider ¢ and © as in Exercise[7.2(b). For 2 > 1 and s € C\{1} not coinciding 
with any zero of ¢ show that 


7 Aetna) _ 1-281 — 5) — Va? °8(p— 5) 


n21 p 
— bo°- (s) — S- g 2"-86(—2n — 38). 
n=1 


(b) When s = 0 and x = 1, simplify the above formula to 


n= f) eway- Xo 


— ¢o log(27) +5 fH ¢' (y) log( 1 —1/y*)dy 


dX 


Deduce that ¢ has infinitely many non-trivial zeroes. 


Exercise 8.4. Write c; for the constant in TheoremB.3]and let 6, = c;/log(|t| +2). 
(a) For o > 1— 0.50; and |t| > 3, show that (¢’/C)(s) < log? |t|. 


(b) In the same range of s, improve the above bound to 


(¢'/¢)(s) < log |¢]. 


[Hint: Prove that Re(—(¢/¢’)(s)) < O(log |t|) for o > 1— 6; and that (¢’/¢)(1+ 
64 + it) < log |t|. Then, use Lemma[8.5]] 


Exercises 93 


(c) For o > 1 — 0.50; and |t| > 3, prove that 
| log ¢(s)| < log log |t| + O(1). 
[Hint: When o < 1+ 6;, show that log ¢(s) = log ¢(1 + 6; + it) + O(1).] 
(d) Show that a constant cz > 0 such that 
s un) < we~2VP8* (x > 2), 
n<ax 


[Hint: Start with Theorem with a = 1+ 1/logz, and then replace the 
contour [a — iT, a+ iT] by the contour 


L = [a—iT, a! —iT] + [o' —iT, a! +iT] + [a + iT, a + iT], 
where a’ = 1 — 0.507. To bound 1/¢ on L, note that |1/¢| < exp{| log ¢|}.] 
Exercise 8.5. Assume the Riemann Hypothesis and fix « € (0, 1/2). 
(a) Adapt the argument of Exercise [8.4]to prove that 
(¢’/C)(s), log¢(s) <e log |t| for |t| > 2,0 31/2 +e. 
(b) For |t] > 2 and o > 1/2 +€ prove that 


log ¢(s), (¢'/¢)(s) Ke (log |e]? ma, 
Infer the Lindelof Hypothesis. [Hint: Adapt the proof of Theorem [6.2]] 


Use Exercise [8.3(a) to give an alternative solution to part (b). [Hint: Re- 
arranging the terms in Exercise [8.3(a) gives an expression for (¢’/¢)(s). If 
supp(¢) C [0,2], then 5>,, A(n)¢(n/x)/n* < a'~? logz. On the other hand, 
using an estimate of the form (77), the sum over zeroes p is O-,(a!/?~ log |t]). 
Optimize z.| 


— 
oO 
Nar? 


Exercise 8.6. Prove that the Riemann Hypothesis is equivalent to: 
(a) For each ¢ > 0, we have (xz) = 2 + O.(a!/?+*) uniformly in x > 1. 


(b) For each e > 0, we have ><, u(n) <- a/?+* uniformly in a > 1. 


nxn 


Exercise 8.7* ([31) Chapters 11-12]). Let €(s) = 1~8/?T'(s/2)¢(s)s(s — 1)/2. 

(a) Prove that € is entire, satisfies the functional equation €(s) = €(1 — s) and 
its zeroes are precisely the non-trivial zeroes of ¢, occurring with the same 
multiplicity. 

(b) Prove that |€(s)| < exp{0.5]s|log|s| + O(|s|)} for |s| > 1. [Hin#t: It suffices to 
consider the case when Re(s) > 1/2.] 

(c) Prove that the Hadamard product h(s) = [],(1— s/p)e’/? converges absolutely 
and uniformly on compact subsets of C (see Exercise [L.14). Deduce that there 
is an entire function Q such that € = he®. 

(d) Prove that €(s)/h(s) < exp{O(|s| log” |s|)} for |s| > 3, as follows: 

(i) Let ns denote the number of zeroes of € in the disk {z : |z—s| < 1} 
counted with multiplicity. Show that ns < log |s|. 

(ii) Show that there is r € [0,1] such that all zeroes of € are at distance 
> 1/(2n, + 2) > 1/log|s| from the circle { z: |z — 5] =r}. 


94 8. The Prime Number Theorem 


(iii) Fix z on the circle |z — s| = r. If |p| > 2\z|, prove that |(1 — z/p)e*/?| = 
eO(lzI"/lel”); if |p| < 2|z|, prove that |(1 — z/p)e*/?| >> 1/(|s| log|s|). 
(iv) If |z — s| =r, prove that €(z)/h(z) < exp{O(|s| log? |s|)}. Then use the 
maximum modulus principle to bound (s)/h(s). 
(e) Use Lemma 5] to prove that Q(s) < |s|log?|s| for |s| > 3. Conclude that 
Q(s) = A+ Bs for some A,B € C. [Hint: If Q(s) = 779 ens”, then cp = 
(1/2mt) for Q(s)s~"—'ds for any R.] 


(f) Show that e4 = €(0) = 1/2 and B = (€’/€)(0) = (—y + log(4m))/2 — 1. 


(g) Prove that —(€’/€)(0) = (€'/€)(1) = B+ 37,(1/p + 1/(1 — p)). Conclude that 
limp oo Ljyjcr 1/p = (7 — log(47))/2 +1. 


Exercise 8.8* (|31} Chapter 15]). Let N(T) be the number of zeroes of ¢(s) in the 
rectangle 0 < 0 < 1,0 <t < T, and assume that T does not coincide with the 
ordinate of a zero. 

(a) Let C be a contour that does not self-intersect, parametrized by the map ¢ : 
[0,1] + C (ie., ¢@ is surjective and ¢|(o,1) injective). Moreover, let f be a 
holomorphic function that is defined in an open neighborhood of C and does 
not vanish on C. 

(i) Show that there is an open, simply connected domain 2 such that QNC = 
C \ {¢(0), d(1)} and f(s) 4 0 for all s € QD. In particular, we may define 
log f(s) on Q. 

(ii) Define the variation of the argument of f along C' by 


(810) Acarg f(s) =I (og F(6(1-)) ~ log F(0(0*))) = Im fF (sas. 


Show that Acarg f(s) = Jo(f'/f)(s)ds. In particular, Ac arg f(s) is 
independent of the choice of ¢ and of the branch of log f. 

(b) Let €(s) be as in Exercise[8.7] and let R be the rectangle with vertices 2, 2+ iT, 
—1+iT and —1, traversed counterclockwise. Prove that €(s) > 0 for s € R, as 
well as that 

2nN(T) = Aprarg€&(s). 
(c) If L = [2,2 +77] + [2+77,1/2+ iT], then prove that 
Ararg&(s) = 2Az arg €(s). 
[Hint: Show that €(o + it) = €(1—o + it).] 
(d) Use Stirling’s formula to prove that 
T T 32 
AryargI'(s/2+1) = —log — + — + O(1/T). 
2 2e 8 
(e) Prove that Az arg ¢(s) = O(logT), and conclude that 
i T 
= — log —— log T). 
N(T) = 5 log 5— + O(log T) 
[Hint: Note that log¢(2 + iT) = O(1). Then use Lemma to control 
Al24éT,1/24iT] arg ¢(s).] 


Chapter 9 


Dirichlet characters 


Having obtained a firm understanding of the frequency of occurrence of 
primes, we turn to other aspects of their distribution. Specifically, we would 
like to know what kind of patterns occur among them. Perhaps the simplest 
such question one can ask is whether there are primes in a given arithmetic 
progression a (mod q), that is to say, primes of the form qn + a, with q and 
a being fixed and n varying. A natural restriction is that a and gq must be 
coprime but, other than that, there is no reason a priori why the primes 
should have any preference for any particular reduced residue class mod q. 
Thus, Occam’s razor leads us to the prediction that 

x 


(9.1) a(a;¢,a) =#{p<2:p=a(modq)}~ ACE 


ieee (x > 00) 


for each pair of fixed and coprime natural numbers a and q. 


Given the success of the Dirichlet series approach to the study of m(x), it 
is tempting to introduce the series ee etna /p*®. However, it is not ob- 
vious how to analyze this function because the condition p = a (mod q) does 
not behave well under multiplication and thus there is no obvious analogue 
of ¢. We will circumvent this problem using the ring structure of Z/qZ. 

To explain the idea, suppose we want to count primes p = 1 (mod 4). 
Instead of counting them on their own, we note that they have a natural 
counterpart, the primes p = 3(mod4). Since every prime p > 2 is either 
1 (mod 4) or 3 (mod 4), we have the linear relation 


m(x;34,1)+ 2(2;4,3) = a7(2) -1~ z/logsa. 


Thus, instead of showing that m(x;4,1) ~ x/(2loga), it suffices to prove 
that m(a;4,1) ~ m(a;4,3) or, equivalently, that m(x;4,1) — m(a#;4,3) = 


95 


96 9. Dirichlet characters 


o(a/log x). We write 


m(a;4,1) — w(2;4,3) = 5° (p), 


pKux 


where €(2) = 0, e(p) = 1 if p = 1(mod4) and e(p) = —1 if p = 3 (mod 4). 
The function ¢ extends naturally to a 4-periodic function: we let ¢(0) = 
e(2) = 0, e(1) = 1 and e(3) = —1, and then define ¢(n) according to the 
remainder of nm mod 4. The key observation is that ¢ is completely multiplica- 
tive over Z, i.e., e(mn) = e(m)e(n) for all m,n € Z. Hence the Dirichlet 
series ) 7, €(p)/p® is closely related to the logarithm of E(s) = )77—, e(n)/n*. 
Furthermore, the periodicity and multiplicativity of ¢ allows us to get our 
hands on E(s) much like we did with ¢(s). 


Suppose now more generally that we want to study the primes in some 
reduced arithmetic progression mod gq. Instead of considering one residue 
class on its own, we consider simultaneously all of them. Given complex 
numbers Cc, indexed by a € (Z/qZ)*, we form the linear combination 


S- cat (a;q,a) = >~ f(p), 


a€(Z/qZ)* pKu 


where f(p) = 0 if plq and f(p) = ca if p = a(modgq) with (a,q) = 1. We 
extend f to a q-periodic function over Z letting f(n) = 0 if (n,q) > 1 and 
f(n) = cq if n = a(modgq) with (a,q) = 1. We wish to find choices of 
coefficients cg for which f is a completely multiplicative function over Z, 
similarly to the function € above. 


We are thus naturally led to the concept of Dirichlet characters: given 
q EN, we say that the function y : Z > C is a Dirichlet character mod q if: 


e yx is g-periodic; 
e x(n) £0 if and only if (n,q) = 1; 


e x is completely multiplicative over Z. 


As their name and the preceding discussion indicate, these objects were in- 
troduced by Dirichlet in his pioneering work on primes in arithmetic progres- 
sions. In the language of group theory, Dirichlet characters are in one-to-one 
correspondence with group homomorphisms from (Z/qZ)* to C*, namely 1- 
dimensional representations of the group (Z/qZ)*. This correspondence is 
given by associating to each x the group homomorphism y : (Z/qZ)* — C* 
defined by x(n (modq)) = x(n). Basic group theory implies that y takes 
values on the unit circle. In particular, |x| < 1. 


As we will see in the next chapter, there are exactly y(q) Dirichlet char- 
acters x mod gq and they provide an orthonormal basis for the Hilbert space 


9. Dirichlet characters 97 


of functions f : (Z/qZ)* — C equipped with the inner product 


(f,9) = fla 


Ad) aca. 


This means that each character y introduces an independent linear relation 
(9.2) Yo x@r(a;4,4) = So xp 
a€(Z/qZ)* px 


Moreover, the aforementioned orthonormality of the Dirichlet characters 
makes it easy to invert these linear relations: we have 


1 
(9.3) (x; q,a) = me py x(a) > x(p) 


x (mod q) PSx 


Examining the above formula, we discover a Dirichlet character mod q 
that stands out: the function n — 1(,,q)=1. We call it the principal character 
mod q and denote it by xo. Its contribution to (x; q, a) equals 


d 


(9.4) 1 > i m(x) + O(log q) 


9(q) ¢(q) 


px, piq 


since there are < logq/log2 prime divisors of g. We thus see that yo nat- 
urally provides us with the conjectured main term in (9.1). Consequently, 
proving (9.1) amounts to showing that 


(9.5) YX) = Orsco(m(z)) for _ x # Xo. 


px 
To estimate the left-hand side of (9.5), we introduce the Dirichlet series 


x(n) 


ns — 


L(s, x) = 
n=1 

This series is called the Dirichlet L-function associated to y. Since y is 
periodic and multiplicative, the behavior of L(s,y) can be analyzed using 
analogous tools to the ones used to study ¢. In particular, we will prove 
that L(s, x) can be analytically continued to the entire complex plane when 
x # xo. For now, we note that L(s,\) converges absolutely for Re(s) > 
1 because |x| < 1. In particular, the complete multiplicativity of . and 
Theorem imply that 


L(s,x) =|] (1 a x) - 


P 


Taking logarithms, we find that >? x(p)/p*° ~ log L(s,x) for Re(s) > 1, 
which provides the link between the sum in (9.5) and L(s, x). 


98 9. Dirichlet characters 


As in the proof of the Prime Number Theorem, it is more convenient to 
work with the logarithmic derivative of L(s, x), for which we have 


Bega Mena 
n=1 


Instead of Doe x(p) and 7(x;q,a), we then estimate 


Ble,x) = Sixn)A(n) and Hsqa)= SDA). 
NKx NKx 
n=a (mod q) 
To proceed with this task, we use Perron’s formula (Theorem [7.2) to write 
w(x, x) in terms of the Dirichlet series (—L’/L)(s,x). The analogy with 
the theory of w(a) is now clear: the zeroes of L(s,) determine the poles 
of (—L’/L)(s,x) and hence the asymptotic behavior of ¢(z, x). In fact, in 
Chapter [11] we will show a generalization of the explicit formula (6.11): 


Lo X10 : wb 
(9.6) ver =- > +0) 
lst 


uniformly for « > T > 2 and non-principal characters y(modq). As in 
(6.11), we write p = 6+7y for a non-trivial zero of L(s, x), which necessarily 
lies inside the critical strip 0 < Re(s) < 1. In addition, zeroes are summed 
according to their multiplicities. There is a small difference though: the 
function L(s, x) could have a zero at s = 0, which is the reason why the 
summands in are slightly modified compared to those of (5.11). 


Given (9.6), proving (9.1) is reduced to showing a zero-free region for 
L(s, x). 


The strategy we outlined above is carried out in the subsequent three 
chapters: Chapter [10] is dedicated to the study of character theory of finite 
abelian groups and its applications to Dirichlet characters, and Chapter 
to the study of Dirichlet L-functions and to the proof of (9.6). Finally, in 
Chapter[12] we prove the necessary zero-free regions for Dirichlet L-functions 
to establish a uniform version of (9.1). 


Exercises 


Exercise 9.1. Find all Dirichlet characters mod 2, 3, 4, 5, 8 and 15. In each case, 
calculate Ss dmaita x(a) for all a = 0, 1, ..., q—1, as well as a x(a) for all 


x (mod q). 


Exercise 9.2. Prove that if x is a Dirichlet character and (n,q) = 1, then |x(n)| = 
1. [Hint: Show that there is some integer k such that y(n)* = 1.] 


Exercises uo 


Exercise 9.3. Let x. be a Dirichlet character mod gq. 
(a) Prove that y(—1) € {-1, 1}. 
(b) Prove that y is a real valued function if and only if y? = yo. 


(c) If x is real valued and gq is prime, then prove that either y = yo or yx is the 
Legendre symbol (-|q). [Hint: Evaluate x(n).] 


Exercise 9.4. 

(a) If yx; is a Dirichlet character mod q,; for j = 1,2, then show that y = y1x2 isa 
character mod [q1, q2]- 

(b) Conversely, if y is a Dirichlet character mod gq, and g = qiq2 with (qm, q2) = 1, 
construct characters x; (modq,;) for 7 = 1,2 such that y = y1x2. [Hint: For 
each n € Z, there is a unique class a(modq) such that a = n(modq,) and 
a = 1(modq). Define x1(n) := y(a).] 

(c) If p is an odd prime, we know that the group (Z/p*Z)* is cyclic for each k > 1. 
Construct all Dirichlet characters mod p*. 

(d) Fix k > 3. We know that for each odd n, there are unique integers a € {0,1} 
and b € {0,1,...,2*~-?} such that n = (—1)%5° (mod 2*). Use this fact to 
construct all Dirichlet characters mod 2*. 

(e) Construct all Dirichlet characters mod g and deduce that there are exactly y(q) 
of them. 


Exercise 9.5. A character x (mod q) is called faithful if x(m) = x(n) 4 0 implies 
that m = n(modq). Otherwise, y is called unfaithful. If ¢ is prime, then show that 
there are y(q — 1) faithful Dirichlet characters mod gq. 


Exercise 9.6. Let y be a Dirichlet character mod q. An integer d > 1 is called a 

period of x if v(m) = x(n) when m = n (mod 4d) and (mn, q) = 1. 

(a) Show that d is a period of x if and only if y(n) = 1 whenever n = 1(modd) 
and (n,q) = 1. 

(b) Show that if d1, dp are periods of x, then so is (d),d2). [Hint: If m,n, k, €€Z 
are such that (mn,q) = 1 and m= n+ kd + dz, then show that there is an 
integer a such that (n + (k — adz)di,q) = 1.] 

(c) Show that there is a divisor d* > 1 of q such that the set of periods of y is the 
set of multiples of d*. We call d* the conductor of xy. 

(d) If x is faithful, show that d* = q, but the converse is not always true. 

(e) Let x = x1X2 be as in Exercise 9.4{a) with (q1,¢q2) = 1, and let d*, d{ and d3 
be their conductors, respectively. Prove that d* = djd3. 

Exercise 9.7. Let x be a real, non-principal character mod p* with p prime. 

(a) If p > 2, prove that x(n) = (n|p) and that its conductor is p. [Hint: Use 
Hensel’s lemma to study the congruence x? = n (mod p*).] 

(b) If g = 8, prove there are three possibilities for y: two of conductor 8 and one 
of conductor 4. 


(c) Ifp =2 and k > 3, then prove that x(n) = w(n (mod 8)), where w is one of the 
three characters in part (b). 


Chapter 10 


Fourier analysis on 
finite abelian groups 


As we saw in the previous chapter, Dirichlet characters are in correspon- 
dence with group homomorphisms from (Z/qZ)* to C*. More generally, a 
character of an abelian group (G,-) is a group homomorphism y : G > C*. 
We write G for the set of all characters of G. The constant function 1 is 
obviously a character, called the principal character of G and often denoted 
by xo. All other characters are called non-principal. 


The set G admits a natural group structure with the operation being 
the multiplication of complex-valued functions. The group (G,-) is called 
the dual group of G or the group of characters of G. 

From now on, we assume that G is finite. In this case, G is also finite. 
Indeed, if n = |G], then g” = 1 from Lagrange’s theorem. In particular, 
x(g)” = x(g") = x(1) = 1. We infer that G is finite and that 1/x = X. 

The set G is a subset of the set of functions from G to C. We denote the 


latter set by L?(G) because it naturally forms a Hilbert space over C with 
respect to the inner product 


Clearly, dimc(L?(G)) = |G|. A fundamental property of G is that it forms 
an orthonormal basis of L?(G). 

We begin by showing that G is an orthonormal set, that is to say, 
(x, ¥)@ = ly= for all characters y,7) € G. The case x = ~ follows im- 
mediately by the fact that x takes values on the unit circle. On the other 


100 


10. Fourier analysis on finite abelian groups 101 


hand, if x 4 w and we set € = x7, then we must prove that ygeG &(g) =0 
Indeed, our assumption that » ~ w implies the existence of an h € G such 
that €(h) £1. Since hG = G, we have 


h) $2 &(9) = S95 (hg) = 95 &(9) 
gEG geG geG 


from which we infer that (x,%)¢ = |G|~! digec §(9) = 0. 


Since G is an orthonormal set of L?(G), it is an independent set. In 
particular, we have IG] < dimc(L?(G)) = |G|. To show that G is a basis of 
L?(G), it suffices to establish the relation 
(10.1) |G| = |G|. 


This relation is obvious if G is cyclic, say G = (g) of order n: in this case, 
every character is uniquely determined by its value at g which, as we saw 
above, must be an nth root of unity. Conversely, every nth root of unity 
e27!4/" vives rise to a character y via the relation y(g/) := e27'4/", so (01) 
follows in this case. In the general case of a finite abelian group, relation 
follows by writing G as the direct product of cyclic groups, say 


(10.2) G2=Z/dqZx---x Z/d,Z, 
and applying the following lemma whose proof is left as an exercise. 


Lemma 10.1. Let (G1,-) and (G2,-) be abelian groups with direct product 
G. The function ¢ : Gy x Gg > G associating the pair (x1, x2) to the 
character G > (91,92) > x1(91)xX2(g2) € C* is a group isomorphism. 


The fact that G is an orthonormal basis of L(G) allows 1 us to do Fourier 
analysis on G: for each f : G > C, we define the function f: Cat by 


(10.3) fod = (f.xde- 
This is the Fourier transform of f and it satisfies the inversion formula 
(10.4) f=>ofW-x 

xeEG 


Specializing to the function f(g) = lyn, where h is a given element of G, 
we find that 


(10.5) lym = Ty Y= x(9)x(h)- 


Finally, we have Parseval’s identity 


(10.6) do IFOOP = gj = FP. 


xeEG gEG 


102 10. Fourier analysis on finite abelian groups 


Indeed, since |z|? = z- Z, we have 


FWP = aE 32s aR) Do Fl) x(h) 
‘al heG 


xXEG xEG ISG 
= Gp DL MOMMY Warn) 
g,hEeG xEG 


and (10.6) follows from (10.5). 


Additive and multiplicative characters mod q 


We study now in more detail the cases when G = Z/qZ and G = (Z/qZ)*, 
the second one corresponding to Dirichlet characters. Since the operation in 
Z/qZ is addition, we call the characters of this group the additive characters 
mod q. Similarly, we also refer to Dirichlet characters, that is to say, the 
characters of (Z/qZ)*, as the multiplicative characters mod q. 


Since Z/qZ is a cyclic group, the discussion in the proof of (10.1) implies 
that the additive characters mod gq are the functions n + e(an/q) indexed 
by a € {0,1,...,q— 1}, where we have introduced the symbol 


(10.7) e(x) =e, 
In particular, the character group of Z/qZ is canonically isomorphic to Z/qZ. 


On the other hand, the construction of the multiplicative characters 
mod q is explained in Exercise [9.4] In addition, a more detailed discussion 
is presented in Chapter 4] and in Section 4.2}. 

The Fourier transform on Z/qZ is called the additive Fourier transform 
mod q. Using the explicit description of the character group of Z/qZ, we 
may identify the additive Fourier transform of f : Z/qZ — C with the 
function iz Z/qZ — C given by the formula 


Zs 1 
Fla) =~ SD Fln)e(—an/q). 
ne€Z/qZ 
In fact, since there is a natural correspondence between functions on Z/qZ 
and q-periodic functions, we can think of f as a q-periodic function from Z 
to C, defined for any q-periodic function f : Z—> C. 


Of particular importance is the interaction between additive and multi- 
plicative characters. Recall that a Dirichlet character mod q is a q-periodic 
function. Hence, it has an additive Fourier transform mod q given by 


Xa) =— Yo x(n)e(—an/q). 


l<n<q 


Primitive characters 103 


A quantity that plays a key role in the study of ¥ is the Gauss sum 


G(x) = So x(nje(n/q) = aX(-1). 


l<n<q 


The multiplicativity of y implies the relation 


(10.8) S x(n)e(an/q) = G(x)x(a) whenever (a, q) = 1. 


l<n<q 


This follows simply by making the change of variables m = an, which is 
invertible mod q when (a,q) = 1. 


Setting Ay = X(—1)G(x)/¢ = x(—1)G(x)/¢, relation can also be 
written as X(n) = Ay-X(n) whenever (n, q) = 1, which evaluates the additive 
Fourier transform of x at frequencies n that are coprime to qg. In the next 
section we shall see that this formula can be expanded to all frequencies 
n for an important class of Dirichlet characters called primitive characters. 
That is to say, when x is a primitive Dirichlet character mod q, we will show 
that Y = A, -X. This demonstrates that primitive characters are conjugate 
eigenvectors of the additive Fourier transform mod q. 


Primitive characters 


Each Dirichlet character y mod q naturally generates a new Dirichlet char- 
acter € at every modulus m that is a multiple of q via the relation 


E(n) = lnm)=1* x(n). 


We then say that y induces €. Inverting our point of view, we also say that 
€ is a lift of x. 

Given a character y mod q, a natural question is whether it is the lift of 
some character ~ mod d with d a proper divisor of g. If this is the case, we 
say that x is imprimitive and call the smallest such d the conductor of x. 
On the other hand, if such a character w does not exist, then we say that y 
is primitive. In the latter case, we define the conductor of x to simply be 
its modulus gq. 


For example, the principal character 79 mod gq is induced by the principal 
character mod 1, that is to say, the constant function 1 on Z. Therefore, if 
q> 1, then xo is imprimitive and has conductor 1. 


Being an imprimitive character x mod q means that there is a proper 
divisor of q (i.e., d < q) that is a period of y in the sense of Exercise [9.6] In 
particular, the above definition of the conductor agrees with the one given 
in Exercise [9.6{c), as the following lemma shows. 


104 10. Fourier analysis on finite abelian groups 


Lemma 10.2. Let y be a Dirichlet character modq. Then, x is imprimitive 
if and only if there is a proper divisor d of q such that y(m) = x(n) whenever 
m =n(modd) and (mn,q) = 1. 


Proof. It is clear that if y is induced by a character mod d, then x(m) = 
x(n) whenever (mn,q) = 1 and m = n(modd). Let us now prove that the 
converse statement is also true. 


We define a function ~ : Z > C as follows: if (a,d) > 1, we set (a) = 0. 
On the other hand, if (a,d) = 1, we note that there is some k € Z such that 
(a+ kd,q) = 1. We then define ¢(a) = x(a + kd), which is independent of 
the choice of & in virtue of our assumption on xy. The function ~w is clearly 
a character mod d inducing y, thus proving that x is imprimitive. 


Using the above lemma, we prove the following fundamental property of 
primitive characters. 


Theorem 10.3. Let y be a primitive Dirichlet character modq. For all 
n€ Z, we have 


S> X(a)e(an/4). 


l<axq 


G(X) 


Proof. When (n,q) = 1, this follows by (10.8). Assume now that (n,q) = 
m > 1, in which case we need to show that 


> Xla)e(an/a) = 0. 


l<axq 


Write n = £m and q = dm, so that (£,d) = 1, and note that 


[}3 


d 
d= Xlaje(an/q) = 7d Xb + ae((b + aj) e/d) 


l<axq j=l b=1 


Me 


m—-l1 
e(bl/d) > X(b+ dj). 
j=0 


~ 
Il 


1 


So it suffices to show that 


m—-1 


(10.9) : x(b+ dj) =0 
j=0 


J 


for all b € {1,2,...,d}. Since y is primitive, there is some jp € Z for which 
the number r = 1 + jod satisfies the relations (r,q) = 1 and x(r) £1. (To 
see this, combine Lemma and Exercise [9.6{a).) In particular, when 


Character sums 105 


reduced mod q, the numbers r - (b+ dj) with 0 < j < are a permutation of 
the numbers b + dj with 0 < 7 < m. Consequently, 
m—-1 m-1 m—1 
X(r) DX + di) = D)x(r(b + dj) = DY) X(b + di). 
j=0 j=0 j=0 
Since x(r) 4 1, relation follows. This completes the proof of the 
theorem. 


The above theorem and Parseval’s formula (10.6) allow us to determine 
the size of the Gauss sum for primitive Dirichlet characters. 


Theorem 10.4. If y is a primitive Dirichlet character modq, then 
IG(x)| = V4. 
Character sums 


Notice in Theorem [10.4] that, even though G(x) is defined as a sum of (q) 
complex numbers on the unit circle, its modulus equals ,/q, which is ap- 
proximately the square-root of y(q) (see Corollary [3.6). This means that 
the numbers y(n)e(n/q) are sufficiently randomly placed around the unit 
circle so that they annihilate each other when added all together. This 
kind of “square-root cancellation” is typical for averages involving Dirichlet 
characters. We demonstrate it in two other settings. 

We start by showing that Theorem [10.3] can be used to show a general- 
ization of the Poisson summation formula (Theorem [B.3). 


Theorem 10.5. Let f € C?(R) such that f(x) < 1/x? for j € {0,1,2} 
and |x| >1, so that f(€) « 1/€? for |€| > 1 
If x is a primitive character ee and N € Ryo, then 


S2 x(n) f(n/N) = a tn) f(nN/q). 


neZ neZ 


Proof. It suffices to prove the theorem when N = 1. The general case 
follows by noticing that the Fourier transform of z > f(a/N) is the function 


E> Nf(N6). 
We use Theorem to write y in terms of its additive Fourier expansion 
and find that 


GG) Yo f(r) SS xX(@)e(an/q) 


neZ l<axq 


1 
= ZY HOY Moetan/o). 


l<axq neZ 


| 


106 10. Fourier analysis on finite abelian groups 


We then apply the Poisson summation formula (Theorem [B.3) to the func- 
tion g(x) = f(x)e(ax/q), whose Fourier transform is g(€) = f(& — a/q). 


Thus 
dL xuF =a de xe Dai n—a/q) 
= x(-1) X(-a f((qn— a 
= GG) be HL Man— aria) 


Since y(—a) = x(qn — a) for all n € Z, the theorem follows. 


If we let N € [1,q] in Theorem and we assume that supp(f) C 
(0, 1], then the sum >7,,¢7 x(n) f(n/N) is supported on integers n € [0, N]. 
Since f(€) < 1/|€|? for |€| > 1, the dominant contribution to the dual sum 
neZ f(Nn/q) comes from integers n = O(q/N). Hence, roughly speaking, 
Theorem [10.5] transforms a sum of length = N to a sum of length = q/N. 
In particular, if N > \/q, then the new sum is shorter, so that bounding it 
trivially yields a non-trivial bound on the sum we began with. 

More precisely, since F(€) < min{1,1/|€|?} for all €, the sum on the 
right-hand side of Theorem [10.5] is 


S> f(Wn/a)x(n)« S> 1+ ae, Nnjo? <a. 
neZ |n|<q/N Ney al 
Together with Theorems [10.4] and (10.5) eas Pa that 


(10.10) df n/N)x in) < i =a, 


another occurrence of square-root cancellation, at least when N x q. 

In practice, we often need an estimate like (10.10) but with the integers 
n € [0, N] weighted by 1 and not by the smooth weight f(n/N). Such an 
estimate is provided by the Pdélya-Vinogradov inequality, which we prove 
using a variation of the argument leading to Theorem [10.5 


Theorem 10.6 (Polya-Vinogradov inequality). Let x be a non-principal 
character modq. For M € R and N © Rso, we have 


S> x(n) K V@log¢. 
M<n<M+N 


Proof. Since x is non-principal, we must have that q > 1. We may also 
assume that M,N € Z. First, we prove the theorem when y is primitive. 
By Theorem and the periodicity of x, we have 


xn)=sa= DY Xae(an/q). 


—q/2<axq/2 


Character sums 107 


Summing the above formula over n € (M, M + N], we find 
1 a 
(10.11) ~Y x=gqy Lo) YO elania) 
M<n<M+N x —q/2<ax<q/2 M<n<M+N 


Notice that we may assume that a 4 0, since y(0) = 0. The sum of e(an/q) 
over n € (M,M + N] is a geometric series with ratio of consecutive terms 
e(a/q). We thus have 


(10.12) |S eten/a)|= | ey | Saar 


M<n<M+N 
for 1 < |a| < qg/2, since |1 — e(x)| = |e(—x/2) — e(#/2)| = 2|sin(az)| > 
A|ar| for x € [-1/2,1/2]. Combining (10.11) and (£0.12) with the fact that 
IG(X)| = Va, we conclude that 


2 1 
xin] << A + toeta/2)). 
re 1<|a\<q/2 
This completes the proof in the case that y is primitive. 


Finally, if x is induced by the primitive character 7 (modd) with dlq, 
then 
x(n) = (n)1(n,q)=1 = ¥(72)1n,q/a)=1 
because yw is supported on integers coprime to d. Mobius inversion then 
implies that 


~ x= YO v= SY ver) YS ula) 


M<n<M+N M<n<M+N M<n<M+N a|(n,q/d) 
(n,q/d)=1 
=Viu@ YS vm). 
alg/d M<n<M+N 
aln 


In the inner sum, we write n = ma and note that u(n) = v¥(a)w(m), so 
that v(a) can be factored outside the summation. Applying the Pélya- 
Vinogradov inequality to the primitive character w yields the estimate 


S- x(n) « > Vdlog d. 
M<n<M+N alq/d 
Every divisor a of g/d comes with a complementary divisor (q/d)/a. At least 
one of these divisors is < ,\/q/d, so that the total number of permissible 


values of a is < 2,\/q/d. (Or simply use the bound in Exercise [2.9(f).) This 
completes the proof in this case as well. 


Remark 10.7. Comparing the right-hand side in Theorem with that in 
(10.10), we see we have an extra logarithm. This is caused by the fact that we 
have replaced the smooth cut-off f(n/N) by the sharp cut-off 1(y7,7+1)(7)- 


108 10. Fourier analysis on finite abelian groups 


Indeed, in the proof of Theorem the sum )oyencm+n (an) with 
a = a/q € [-1/2,1/2] decays like 1/|a|. Had we weighted the integers 
n € (M,M + N] smoothly, we would have had faster decay, say < 1/|a|?. 
Superficially, this extra logarithm seems like a technical and insignificant 
matter. After all, it is of negligible size compared to \/¢. However, improving 
upon the Pélya-Vinogradov inequality is very hard and is related to some 
very deep conjectures about Dirichlet characters. Paley [149] showed the 
existence of an infinite set of primitive quadratic characters y for which 


M(x) = sup | do x(n)| > VGlog log q, 


with q denoting the conductor of x. On the other hand, Montgomery and 
Vaughan proved that M(x) < \/qloglogq assuming a suitable generaliza- 
tion of the Riemann Hypothesis called the Generalized Riemann Hypothesis 
that we will discuss in the next chapter. 


Remarkably, when y has odd order g as an element of the group of Dirich- 
let characters mod q, Granville and Soundararajan showed that the 
Pélya-Vinogradov inequality can be improved. Their results were sharpened 
by Goldmakher [61], who established the estimate M(x) <g /(log q)° for 
each fixed 0 > (g/m)sin(z/g). A further improvement of this result was 
announced more recently by Lamzouri and Mangerel [123]. 


Exercises 


Exercise 10.1. Show that the function 
f{M= SS en/a) 
ne (Z/qZ)* 


is multiplicative and calculate it. 


Exercise 10.2. 

(a) Calculate all primitive characters mod 3, 4, 5, 8 and 15. 

(b) If gq is a prime, then show that there are q — 2 primitive characters. 

(c) Show that a faithful character, defined in Exercise [9.5] is also primitive. 

(d) Calculate all primitive and all faithful characters when gq is the product of two 
distinct primes, and when q = p? with p prime. 

(e) If C, denotes the set of Dirichlet characters mod q, and Cj denotes the set of 
primitive elements of Cq, then calculate |C7|. [Hint: Prove that |Cg| = ) 74), |Ca|-] 


(f) Let vy = v1xX2 be as in Exercise[9.4{a) with (q1, g2) = 1. Show that x is primitive 
if and only if x; and x2 are primitive. 


Exercises 109 


(g) If y is a real primitive character mod q, then show that q = 2*q', where 
k € {0,2,3} and q/ is odd and square-free|!| Moreover, if k € {0,2}, then there 
is exactly one real primitive character mod q, whereas if k = 3, then there are 
exactly two real primitive characters mod gq. 


Exercise 10.3. Given two primitive characters x, (modq) and an integer a € Z 
coprime to q, show that 


S> x(n + a)@(n) = *! 


n€Z/qZ 


—a)(—a)G(x)G (XW) 
G(w) 


Simplify the above expression when w = x. [Hint: Use Exercise [10.1] 


Exercise 10.4* Given a non-principal character y mod qg and N € [1,q¢/2)| NZ 


show that 
S> x(n)(1 = |n|/N) « V4. 
In|<nN 


[Hint: Vinj<en (1 — |n|/N)e(an) = N~1(sin(Nza)/ sin(ma))?.] 
Exercise 10.57 If y (modq) is induced by w (mod d), then prove that] 
(10.13) G(x) = w(m)v(m)G(e) 

with m = q/d, as follows: 

(a) For any k € Z, show that 


Gxy= So v(@e((atkd)/q), 


l<axq 
(a+kd,q)=1 
and conclude that 
~ > v(a) e(a/q) S> e(k/m). 
TY aa l<k<m 


(a+kd,q)=1 
(b) When (a,d) = 1, show that 
S- e(k/m) = 1(a,m)=1H-(m)e(—ad/m), 


1<k<m 
(a+kd,q)=1 


where d is the multiplicative inverse of d(modm). 
When (d,m) > 1, show that both sides of (£0.13) are zero. 


Assume that (d,m) = 1. If 7 denotes the multiplicative inverse of m (mod d), 
show that d/m + m/d = 1/q(mod1), and complete the proof of (£0.13). 


— FN 
a0 
Ge <> 


1There is an important connection between real Dirichlet characters and the theory of binary 
quadratic forms, presented in Chapters 5 and 6 of Davenport’s book [31]. 
See p. 67] for an alternative proof. 


Chapter 11 


Dirichlet L-functions 


We now turn to the study of the infinite series L(s,v) = S772, x(n)/n’, 
namely the L-function corresponding to the Dirichlet character y. Since 
lx| < 1, this series converges absolutely when Re(s) > 1, and for such s we 
have the Euler product representation 


(11.1) L(s,x) =|] (1- <@)) 


Pp 


When x # xo, the summatory function >7,,<, x(n) is uniformly bounded 
by the Pélya-Vinogradov inequality. Partial summation then implies that 
L(s, x) converges conditionally in the half-plane Re(s) > 0. It is easily seen 
that L(s,\) diverges at s = 0, so that L(s,y) has abscissa of convergence 0 
(but abscissa of absolute convergence 1). 


We will show that L(s,) can be extended to an entire function. More- 
over, we will prove that it enjoys various special properties similar to the 
ones possessed by ¢. 


The analytic continuation and the functional equation 


Notice that if y (mod q) is induced by the character w (mod d), then 


(11.2) L(s,x) =] (1 2 a = L(s,v) J] (1 = aa 


pia pla 


Because of this relation, the properties of L(s, 7) transfer (with appropriate 
modifications) to L(s,x), so we often restrict our attention to the study 
of Dirichlet Z-functions attached to primitive characters whose theory is 


110 


The analytic continuation and the functional equation 111 


simpler. In particular, Dirichlet L-functions attached to primitive charac- 
ters satisfy a functional equation analogous to (6.1) for the Riemann zeta 
function. 


The functional equation of L(s, x) changes slightly according to the value 
of y(—1). Characters with y(—1) = 1 are called even, whereas characters 
with y(—1) = —1 are called odd. We then take 


(11.3) 


0 when y is even, 
a= 
1 when x is odd, 


and we introduce the so-called completed L-function 


€(5,x) = (g/m) ?T (A2*) £(s,x), 


which is analogous to the function €(s) that we defined in Exercise[8.7] The 
functional equation for L(s, x) also involves the quantity 


G(x) 
E(x) = =, 
e/q 
called its root number. Note that |e(y)| = 1 in virtue of Theorem 
Furthermore, it is easy to check that ¢(x) = 1/e(x). 


Theorem 11.1. Let xy be a primitive, non-principal character. The func- 
tions L(s,x) and &(s, x) can be continued analytically to the entire complex 
plane. Moreover, for all s € C, their extensions satisfy the functional equa- 
tion 


(8, x) =€(x) *€(1—3,x)- 


Proof. The key to the proof of this theorem is the variation of the Poisson 
summation formula given in Theorem[10.5| The argument is very similar to 
the one leading to (6.1), so we only sketch it. 


We take f(x) = 2r%e-™” which has the same parity as y, ie., f(-—z) = 
x(—1) f(x), and note that its Mellin transform is 
(oe) 
= [ fayy ey = orrr(254), 
0 
Arguing as in (6.4), we find 


(114) g7/€(s, x) = q°?L(s,x)F = fo f(ny/ Yayo tay 


for Re(s) > 1. Since y(0) = 0, and f and y have the same parity, we have 


(oe) 


= 2) f(ny/V@) = 5 x(n) f(ny/V/9). 


2 er 


112 11. Dirichlet L-functions 


We apply Theorem to the function f with N = \/q/y to find that 


5 - et Lreyfow va 


For our choice of f, we note that f= x(—1)i"f so that 


Sy(y) = SxG/) _ e005) 
‘ e(X)y y 
We insert the above transformation formula into the part of the integral 
over y € (0,1) in L4). As for the Riemann zeta function, this proves at 
the same time that €(s, x) can be extended to an entire function, as well as 
that it satisfies the claimed functional equation. We leave the verification of 
the details of this claim as an exercise. Finally, we note that, since [ does 


not vanish anywhere, L(s,y) itself extends to an entire function. 


When y is a primitive, non-principal character, Theorem [I1.]] allows us 
to obtain some information regarding the location of the zeroes of L(s, x) 
which, in view of the explicit formula (9.6), rule the distribution of primes in 
arithmetic progressions. When Re(s) > 1, the Euler product representation 
(11.1) implies that L(s,y) does not vanish. Thus neither does €(s, x) and, 
by the functional equation, we also have €(s,y) #4 0 for Re(s) < 0. Since 
I'((s + a)/2) has simple poles at the points —2n — a, n € Zso, we deduce 
that L(s, x) must have simple zeroes at —1, —3, —5, ... when y is odd, and 
at —2, —4, —6, ... when x is even, but no other zeroes when Re(s) < 0. 
Moreover, L(0,) = 0 when x(—1) = 1. In Theorem [12.8] we will show that 
L(1,x) 4 0, which implies that 0 must be a simple zero of L(s, x) when y 
is even, and that L(0,.) #0 when x is odd. 


The zeroes of L(s,x) at the points —2n — a with n € Zso are called 
the trivial zeroes of L(s,y). All other zeroes of L(s,\.), which are in cor- 
respondence with the zeroes of €(s,x) and necessarily lie in the critical 
strip 0 < Re(s) < 1, are called non-trivial. They are usually denoted by 
p= 6+17, where 0 < 8 <1, or by py = By + iy, when we want to under- 
line their dependence on x. We note that the functional equation and the 
obvious symmetry L(s, x) = L(3, xX) imply that if p = 8 +77 is a non-trivial 
zero of L(s,x), then so is 1—-p = 1-—6+7+4. It is widely believed that 
an extension of the Riemann Hypothesis holds, often called the Generalized 
Riemann Hypothesis. This conjecture postulates that all non-trivial zeroes 
of L(s, x) lie on the critical line Re(s) = 1/2. 


Finally, we consider the case when y is an imprimitive character induced 
by w(modd). Recall the factorization (11.2). In particular, all zeroes of 
L(s,w) are also zeroes of L(s,x). Notice that L(s,x) might have some 
additional zeroes, at points s with p* = ~(p) for some p|q. All such zeroes 


Bounds for L(s, x) 113 


are on the line Re(s) = 0 and we consider them to be trivial zeroes of L(s, vy), 
together with the trivial zeroes of L(s,w) at s = —2n — a (with the caveat 
that s = 0 is excluded if ~ = 1, that is to say, if y is principal). All other 
zeroes of L(s, x) are considered non-trivial; the summation in runs over 
them. 


Bounds for L(s, x) 


As in the case of the Riemann zeta function, it is very useful to have bounds 
on Dirichlet L-functions. By the functional equation in Theorem [I1.1] we 
may restrict our attention to the half-plane Re(s) > 1/2. We could use 
Theorem [6.2] but we present instead a different method that is simpler and 
thus more flexible, even though it yields weaker results. For the application 
of Theorem [6.2] to L(s,), see Exercise (LL. 


Lemma 11.2. Let x be a non-principal Dirichlet character modq. For 
J E€Zs0, s=ot+it with 1/2 <0 < 2, we have 


IL (s,x)| «<j QMO“ ogQyt! — with Q = Va(|t| + 2). 


Proof. We estimate L\)(s,y) = 7°, y(n)(—logn))/n® by inserting the 
Pélya-Vinogradov bound on > /,,<, x() via partial summation. However, we 
have to be careful because partial summation yields poor bounds for small 
n. There are two reasons for this: firstly, the function x > x* oscillates a lot 
for small 2, with its derivative sx*~! getting under control only for x > |s|. 
Secondly, the Pélya-Vinogradov bound is non-trivial only for character sums 
of length > ,/qlogq. For these reasons, we bound the summands with small 
n eid aaa that 

(I a 

x{n)logn}? < NaOH ie Ny S- = < Nmax{0,1—o} (log Nn), 


n=1 


To ee terms with n > N, we apply parvial summation: 


57 xe Mes ny" = [Gs ogy) q x 


n>N N<n<xy 


log y)3 — j(log y)i-1 
= ) s(los y) vf og y) diy 
N 


2 


The Pélya- Vinogradov ineduality then yields the estimate 


)(lo n)J lo 
yy Koen” x; sl Vallowg) i ey ay 


n>N 


& |s|Vallog q)(log N)IN~". 
Taking N = 1+ | s|./a| completes the proof. 


114 11. Dirichlet L-functions 


Proving the explicit formula for L(s,y) 


We conclude this chapter with a proof of the explicit formula for 
L(s, x), which we restate in a slightly more general form. 


Theorem 11.3. For x > T > 2 and xy (modq), we have 


= X10 2 x 
(11.5) W(@,x) =hewe- > — + ee 


ch 
lyl<T 
We start with a technical lemma, analogous to Lemma 


Lemma 11.4. Consider s =o +it and a primitive, non-principal Dirichlet 
character x mod q > 3. Furthermore, let a € {0,1} be as in (11.3). All 
implied constants below are absolute. Moreover, the zeroes of L(s,x) are 
listed and counted with their multiplicity. 

(a) There are < log [q(|t|+1)| non-trivial zeroes of L(s, x) with |y—t| <1. 
(b) Assume that |s — z| > 1/2 for all trivial zeroes z of L(s,x). Then 

i 
<(8,x) = y = + O(log [a(\s| +1)]). 


ly—-t|<1 


Proof. Lemma[I1.2]implies the bound L(s,y) < (q(|s| + 1))? when 1/2 < 
Re(s) < 3/2 (in fact, it implies a better bound, but this will be sufficient). 
In addition, we have |L(s,x)| < ¢(38/2) = O(1) when Re(s) > 3/2. By 
the functional equation of L(s,) and Exercise [1.12(a), this “polynomial 
growth” of L(s, x) in q(|s| + 1) can be extended to the half-plane Re(s) > 
—10. Employing Lemmal8.6(b) as in the proof of Lemma[8.2l yields part (a). 
Similarly, we also obtain part (b) when o > —10. Finally, when o < —10, we 
note that, analogously to (8.9), we have (L’/L)(s,y) = —(L’/L)(1—8,x) + 
O(log(q(|s|+1))). Part (b) follows in this case as well by the trivial bound 
(LI/L)(1— s,X) = O(1). 


Proof of Theorem First of all, note that 
(11.6) » A(n) < ‘Se s logp < Se logx < (log q)(log x). 
n<a, (n,q)>1 plq, k>1:p*<x pla 


Hence, if € (mod m) is the primitive character inducing y (mod q), then 


(11.7) V(x) = (a, €) + O((log gq) (log x)). 
Since L(s, x) and L(s,€) share the same non-trivial zeroes, (11.5) follows for 
x if we can prove it for €. This means that, without loss of generality, we 
may assume that y is primitive. 

In addition, as in the proof of (5.11), we may reduce the proof to the 
case when T is such that |T — y| >> 1/logz for all zeroes of L(s, x). 


Exercises 115 


We let a =1+1/loga and use Theorem [7.2] to write 


_-=l [' x ax(log x)? 
w(z, Xx) a Oni les 7 (ex) Zds +o( “ger 


+ log r) : 
| Im(s)|<T 


Due to the potential presence of zeroes of L(s,x) at or close to s = 0, 
it is convenient to modify the integrand in the above formula and remove 
the pole at the origin from the factor x*/s: using Lemma [7.1] we see that 
SRe(s)=a, |Im(s)|<T n-*s—'ds = O(1/(Tn® log n)) for n > 2. Consequently, 


=! i xe —1 ax(log x)? 
wend =F fra —(s, x) as -0( AED 4 toga ; 


s)j=a JL 8 
| Im(s)|<T 

Similarly to the proof of Theorem. 1]in Chapter [8] we move the contour to 
the line Re(s) = —N — 1/2 with N a large integer, picking up contributions 
from the zeroes of L(s,x) (and the pole at s = 1 if y = 1). In the case 
when x = 1, we use Lemma to control the logarithmic derivative of 
L(s,x) = ¢(s) on the new contour; otherwise, we use Lemma|8.2] We leave 
the details as an exercise. 


Exercises 


Exercise 11.1. Fix ¢ > 0 and C > 1. For all primitive, non-principal Dirichlet 
characters x (mod q) and all s = 0 + it with o > —C, show that 


1 ifo >1+e, 
L(s, x) Kec [a({t| + Q\| mete) /2 if —e < o < T+ eg, 
[a(lél + 2)]/?-@ Cao < e 


Exercise 11.2. Assuming the Generalized Riemann Hypothesis, prove that: 
(a) B(x, x) = lyexot + O(a"? log*(gz)) — (w > 1, x (modq)). 
(b) (a9, 4) = «/p(q) + O(a'/? log*(qx)) (aw > 1, a € (Z/qZ)*). 


Exercise 11.3. Let y be a primitive, non-principal Dirichlet character. 


(a) Show that the Riemann Hypothesis for L(s, x) is equivalent to knowing that 
Re(p) < 1/2 for all non-trivial zeroes p of L(s, x). 


(b) Show that the Riemann Hypothesis for L(s, x) is equivalent to knowing that 
for each fixed ¢ > 0 we have w(a,y) <e,, #!/?+* uniformly for « > 1. 


(c) Show that L(s,.) must have infinitely many non-trivial zeroes. 
(d) Fix @ < 1/2. Show that we cannot have (x,y) < 2° for all z > 1. 


Exercise 11.4. Let y be a primitive, non-principal character mod gq. 


116 11. Dirichlet L-functions 


(a) Let ¢ and ® be as in Exercise [7.2(b), s € C and x > 1. If do 4 0, we also 
assume that s ¢ {z: L(z,v) =0}. Then 


3 A(n)x(n)o(n/a) = bo (8,x) So a? *B(2 - s), 


ns 
no1 


where z runs over all zeroes of L(s, x) (trivial and non-trivial). 
(b) Assume the Riemann Hypothesis for L(s, x). Show that 


L' 
7 (5X); log L(s, x) Ke (log(alt| + qe 


for Re(s) > 1/2+e and € > 0, in two ways: firstly, use part (a) as per Exercise 
[8.5{c); secondly, use a suitable adaption of Theorem 


(c) Assume the Riemann Hypothesis for L(s,y) and fix « > 0. Show that 


S- u(n)x(n) « gial/?t® for alla > 1 


nXKx 
with the implied constant depending at most on e. 


Exercise 11.5. Assume the Generalized Riemann Hypothesis. Given real numbers 
x,q,T > 3 and a residue class a € (Z/qZ)*, we define 


B(q,a) = #{n(modq) : n* = a (mod q) }, 


ge 
A(xsqa)= DY) logp, Zr(asqa)= YP x(a) DO 1/24 7. 
x 


pk x (moda) ay XT 
p=a (mod q) 
(a) For « > T°/? > 2, prove that 
x Jr a log? x 
A(x; 4,0) = a — 2B a) + Zr(x3q,4 +0,(—3*). 
[Hint: 0(x; 9, a) = (2; 4,4) — Dye ya, p2sa (mod q) log p + O(2/8).] 


(b) For x — oo, prove thar] 


(11.8) O(a; 4,3) — O(a; 4,1) = ve(1 + ye =) + o(V/z), 


lyx|<22/8 1/2 7 wx 


where y is the unique non-principal character mod 4. 


Exercise 11.6* ((31) Chapter 16]). Let y(modq) be a primitive, non-principal 
character and write N(T,x) for the number of non-trivial zeroes p of L(s,x) with 
ly| < T, counted with multiplicity. Throughout, T is chosen so that L(s,x) 4 0 
when Im(s) = +T. 


If we let « = e“, then Joe? eda = O(1/|7|) = o(logx) for y 4 0. Hence, we expect 
that the sum over zeroes in is o(1) on average over x. The presence of the term 1 on the 
right-hand side of then means that most of the time we have 0(2; 4,3) > 0(a;4,1) + 6a 
with 6 > 0, meaning there are slightly more primes in the residue class 3 (mod 4) than in 1 (mod 4). 
This discrepancy is called Chebyshev’s bias and it is explained in detail in [157] and [70]. 


Exercises 117 


(a) Let R be the rectangle with vertices 5/2 + iT and —3/2 iT traversed coun- 
terclockwise, and let DL be its right half. Prove that 
m-(N(T,x) +1) = Axarg &(s, x). 
(b) Adapting the argument of Exercise [8.8] prove that 


T qr 
N(T,x) = 2 log ae + O(log(qT)) (T > 2). 


Exercise 11.7*((31] Chapters 11-12]). Let y be a primitive, non-principal Dirichlet 

character such that] p 4 0 for all non-trivial zeroes of L(s, x). 

(a) Prove that the Hadamard product h(s,x) = [[,(1 — s/p)e/?, defined over all 
non-trivial zeroes p of L(s, x), converges absolutely and uniformly on compact 
subsets of C. 

(b) Prove that €(s,y) = e4x+®xsh(s,y) for some A,, By, € C by adapting the 
argument of Exercise [8.7] 

(c) Show that e4x =€(0, x), that B, =(€'/£)(0, x) and that Re(B,)=— », Re(1/p). 


Exercise 11.8% Consider a primitive, non-principal character y mod q, and ¢, ® 
as in Exercise[7.2(b) with ¢9 = 1. Let a € {0,1} be as in (11.3) and 


ae 2°re—IP(1 — s) sin(rs/2) = 2°-11*/([(s) cos(as/2)) ifa=0, 
oN.) 288 1D(1 — s) cos(as/2) = 2°-1n*/(I(s) sin(as/2)) ifa=1. 


(a) For each s € C, show that L(s, x) = e(v)q¥/?-*Aa(s)L(1 — 5, X). 

(b) From now on, consider s € C with 0 < o < 1, as well as x,y > 1 with 
ry = qt /2n, where 7 = max{|t|,1}. Adapt the argument of Exercise [7.8{a,b) 
to show the approximate functional equation 


Bex) =D LOOM) 5 egy gr Feast) 


ni-s ? 
n>1 n>1 


where 
1 


pe (wu) = -—— ®(z)Aa(s + z)(ur/27)*dz. 
} 2nt J(—3) 
(c) Show that L(s,x) <- (qr)“"-%/?+* by adapting the argument of Exercise 


[7.8{c-f). (This reproves Exercise inside the critical strip.) 


?We will show later, in Theorems and that L(1, x.) #0. Hence, the hypothesis that 
p # 0 for the non-trivial zeroes of L(s, x) follows by the functional equation (Theorem [i1.1). 


Chapter 12 


The Prime Number 
Theorem for arithmetic 
progressions 


The pinnacle of the theory of Dirichlet Z-functions is a quantitative form 
of the Prime Number Theorem for arithmetic progressions that is known in 
the literature as the Siegel-Walfisz theorem. 


Theorem 12.1 (Siegel-Walfisz). Let A> 0. There exists an absolute con- 
stant c > 0 such that if1<q< (logx)4 anda € (Z/qZ)*, then 


22q,a) = a +e Oa(me-evP*). 


An important feature of this result is that it proves that primes are 
equidistributed in (Z/qZ)* when the modulus qg tends to infinity with x 
at a rate that is polynomial in log xz, something that is very important in 
applications. The range of g and x can be significantly enlarged under the 
assumption of the Generalized Riemann Hypothesis (see Exercise [11.2). 


To achieve the required uniformity in q and x, we must keep track of the 
dependence on q in the various estimates we prove. However, it might be 
easier to think of q as fixed, say gq = 3, at the first passage of this chapter. 


A zero-free region for Dirichlet L-functions 


In view of the explicit formula (11.5), the bulk of the proof of Theorem [12.1 
is establishing a zero-free region for Dirichlet L-functions. We start with a 
simple corollary of Lemma 


118 


A zero-free region for Dirichlet L-functions 119 


Lemma 12.2. Let y be a Dirichlet character modq, s = 0 + it with ao © 
[1,2], and Z a sublist of the non-trivial zeroes of L(s,x) with |y —t| < 1, 
possibly containing some zeroes multiple times. Then 


Re (Fx) > Re (+ os =) 4 Re ( =) = O(tog(all + 24). 


pez 


Proof. Let ~ (mod 4d) be the primitive character inducing y. Then 


y= owt ry we 20622 Gia Oteee 


(s, 
p\lq k=1 v 


by (11.2). We then apply LemmaB.2(b) or I1.4{(b) to L(s, w), according to 
whether w = 1 or ~ #1, respectively. The lemma then follows by noticing 


that Re(1/(s — p)) > 0 when o > 1. 


(12.1) Fis, 


Next, we prove a generalization of the zero-free region for ¢. Note that 
our result leaves the possibility for the existence of certain exceptional zeroes 
close to 1. These potential violations to the Generalized Riemann Hypoth- 
esis require different arguments that we present in the subsequent section. 


Theorem 12.3. Let q¢ 2 3 and Z4(s) = []\ (noaq) L(8, x). There is an 
absolute constant c; > 0 (i.e., cy is independent of q) such that the region 
of s=oa+it with 
C1 

log(qr)’ 
contains at most one zero of Z,. Furthermore, if this exceptional zero exists, 
then it is necessarily a real simple zero of Z,, say 8, € [1—c/logg, 1], and 
there is a real, non-principal character x1 (mod q) such that L(61, x1) = 0. 


(12.2) o21- where 7 = max{l,|t|}, 


Proof. By Theorem [8.3] and relation (11.2), we may restrict our attention 
to zeroes of Z, corresponding to non-principal characters . As in the proof 
of Theorem [8.3] the idea is that if L(s,.) has a zero close to 1 + it, then 
x(p) ~ —p** for most primes p. Therefore, y?(p) ~ p?"’, which yields a pole 
of L(s, x”) at s = 1+2it. This can only happen if y? = xo and t = 0, so that 
this pole corresponds to that of ¢ at s = 1. Real zeroes of real characters 
are then handled using a modification of this argument. 


We now give the details of the above sketch. For convenience, we let 
Ly = log (qmax{|t|, 1}). 
Let p = 6 +77 be a zero of L(s, x). If x is real, we further assume that 
either y # 0, or that p has multiplicity > 1 in L(s,y). We want to show 


that p lies outside the region (12.2). Assume for the sake of contradiction 
that 6 > 1—c/L£,. We will show this is impossible if cy is small enough. 


120 12. The Prime Number Theorem for arithmetic progressions 


Set cy = 1/M? and o =1+4+1/(MCL,), and note that 

1 M-+1 1 

= rd 
ML, ML, ~ (M—DL, 
Recall the distance function D,(f, g) defined in (8.4). The triangle inequality 
implies that 
o(x(n)n"7, X(n)n"7) < Do(x(n)n™, w(n)) + Deo (u(n), X(n)n"”) 
= 2Do(x(n), w(nr)n"). 

By a straightforward computation (consult the proof of Theorem [8.3), and 
using that )/,),(log p)/p = O(log q), we have 


Cc 
¢ 


(12.3) go Pe g-14 


o(x(n)n~"7, X(n)n'7)? = — 


=-(a) + Re (2 (o + 207, x *)) + O(log q) 
and 
x(i(n),x(n)n")? = —£ (a) — Re (Fo + in, x)) + O(log.) 


Since (—¢’/¢)(a) = 1/(o — 1) + O(1) = ML, + O(1), we infer that 


L' L' 
(12.4) 4Re (= (o+7i7,x x) +Re (= (o + 2iy, x *)) < (83M + O(1))Ly. 
We now bound from below the two summands on the left-hand side of (12.4). 
Lemma with Z = @ implies that 


ne (Lio +ah.xd) > —Re( Beas) 012) 
> —(0(x,p)M + O(1))Ly, 


where 0(y,p) = 1 if |y| < 1/(2Mlogg) and x is real, and 6(x,p) = 1/2 
otherwise. Similarly, 


(12.5) 


ie 1 
(12.6) Re (= (o +i, x x) > z- OlLy) = (M- O(1)L, 
oO — 
by Lemma{12.2] with Z = {p}, and by (12.3). Inserting (12.5) and (12.6) into 
(12.4), and choosing a large M leads to a contradiction aa 5(x;, ea = 1/2, 
It remains to treat the case when d(x, ~) = 1, that is to say, when y is 
real and |y| < 1/(2M logq). There are two sub-cases. First, if p is a multiple 
real zero, then Lemma with Z = {p, p}, and relation (12.3) imply that 


/ 


Re(F(o +H, 8 2 3 


Together with (12.4) and , this leads us to a contradiction when M is 
large. 


(C,) > 2M - O(1))L, 


A zero-free region for Dirichlet L-functions 121 


Finally, assume that y is real and that 0 < |y| < 1/(2Mlogq). In this 
case, the obvious symmetry L(s, x) = L(S,x) implies that 7p is also a zero 
that is different from p. Applying Lemma with Z = {p,p} then yields 

L' 1 a—B 
Re (=(o + 47,x)) > ! O(L 
LA +00)? 55+ Ga ppee OM) 
> (1.8M — O(1))L, 
by (12.3). As before, this leads to a contradiction when M is large enough. 

We have thus proven that the only possible zero in the region (12.2) is a 
real, simple zero, and it can only arise as a zero of some Dirichlet L-function 
L(s,x) of a real, non-principal Dirichlet character y(modq). It remains 
to prove that at most one such y exists. Assume for contradiction that 
there are two different such characters, say x; and yg, and let 6; and fo, 
respectively, be their zeroes in the region (12.2). By the triangle inequality, 


(12.7) o(X1;X2) < De(x2, uw) + Do(H, x2). 


Since x1, x2 and y1x2 are all real, non-principal characters mod q, Lemma 


with Z = Q yields 


o(X1, X2)? = -£(0) + <-(0,x1x2) + O(log q) 


1 
2 ar O(log q) = (M — O(1)) log, 


where g = 1+1/(Mlogq) and c, = 1/M? as before. On the other hand, 
arguing as in (12.6), we have 


/ Ef 
wasn)? = ~$(0) ~ F-(0,x1) + O(log) 
1 1 


< + O(1 log q. 
oo aon O(log q) < log q 


The analogous upper bound holds for D,(1, 1)? too. Inserting the above 


estimates into (12.7) and taking M to be large enough leads to a contradic- 
tion. This completes the proof of the theorem. 


Theorem [12.3] allows the potential existence of extreme violations to the 
Generalized Riemann Hypothesis. Before discussing this issue, let us see 
what we can infer from Theorem [12.3] about prime numbers. 

Theorem 12.4. There is an absolute constant co > 0 such that 
Bi 
xL— x1(a)x 
w(eiqya) = AE 5 O(c ov) 
Pia 


uniformly for « > q > 3 and (a,q) =1, where the term y1(a)x*! is present 
only if there is an exceptional zero in Theorem [12.3 


122 12. The Prime Number Theorem for arithmetic progressions 


Proof. We establish the theorem with cz < 1. As a consequence, we may 
assume that g < eY!°8*; otherwise, we may simply use the trivial bound 
(x; 9,4) < #(log x)/q. 

We use the orthogonality of Dirichlet characters (see (10.5)) and the 
explicit formula to write 


1 
v(2;q,a)=—~ JS) xX(av(2,x) 
~(q) aad 
x = gex — 1 «log? x 
(12.8) = —~ — X(a) +O , 
(a) oF 2, Ox ( - ) 


where T € [3, \/2] will be chosen later. Let py = 8, + i7y be a zero of 
L(s, x) different from the exceptional zero $,, and with imaginary part 7, 
n |—T,T]. We claim that there is an absolute constant c > 0 such that 


gPx —1 gi—e/ log(aT) 


(12.9) < when py # (1, yy] <T 
Px i ly| 7 . 


Indeed, if 6, < 1/3, we use the trivial bound («x — 1)/py « a? loga « 
2°/(logx)/T; otherwise, we use Theorem [12.3] to find that By < 1—- 

c1/log(qT) for some absolute constant c; > 0. Since we also have that |p,| < 

1+ |x| in this case, relation (12.9) readily follows with c = min{c;, 1/6}. 


Inserting the bound (12.9) into (12.8), we conclude that 


; _ £ Xi (a)x 
w(x; q, @) a0) By +R 


with a remainder term R of size 
gi-e1/log(aT) v log? Lv 


Reo 2 2, itm * F 


x (mod q) |¥x|<T, px #F1 

«log? x 
po? 

where we used Lemma a) to bound the sum over the zeroes of L(s, x). 

We choose T = eV!°8* and recall that g < eV'€". Consequently, R < 

xe 2VP8" with cz = c/3. Since ¥;(a) = x1(a) by the fact that x1 is a real 

character (if it exists at all), the proof is complete. 


< piel log(qT) log?(qV) a 


Since for the moment we know nothing about (61, it could be the case 
that 6; = 1. In this extreme case, and assuming that q is fixed and 7 > oo, 
Theorem [12.4] implies that 


(2+ o(1))a/p(q) if x(a) = —1, 


(12.10) W(2; a,q) = eee if x1(a) =. 


Exceptional characters 123 


We thus see that an exceptional zero of |], (moaq) L(S, xX) at § = 1 forces a 
very uneven distribution of the primes among arithmetic progressions mod 
q, with half of the reduced classes containing twice as many primes as they 
should, and the rest containing very few primes. 


More generally, it can be proven that if there is an exceptional zero at 
By, = 1-—1/(Mlogq), then the residue classes a(modq) with yi(a) = —1 
contain (2 + 02,1; ,u2-»00(1))x/(~(q) log x) primes of size < x, provided that 
x is in the range [q“", qM/u2], This result is due to Linnik. We will prove a 
weak form of it in Chapter 


Exceptional characters 


The characters y whose [-function has a zero 6 in the region are called 
exceptional, and the zero ( is called an exceptional zero or a Landau-Siegel 
zero. This definition should not be taken too literally because it depends 
on the choice of the unspecified constant c; in (12.2). Strictly speaking, the 
rigorous definition of Landau-Siegel zeroes concerns a sequence of characters 
x; (mod q;) such that no product xj, with j 4 k is principal, and for which 
there are real numbers 


Bj = 1— 0j-400(1/log qj) such that L(8;, xj) = 0. 


Disproving the existence of Landau-Siegel zeroes is a major open problem 
in analytic number theory. We establish some partial results about them. 


We begin by showing the following result due to Landau, which proves 
that exceptional zeroes “repel” each other. 


Theorem 12.5 (Landau). Let v1 (mod q1) and x2 (mod q) be two real, non- 
principal characters that are not induced by the same primitive character, 
and both of whose L-functions have real zeroes (3; and 82, respectively. There 
is an absolute constant c > 0 such that 


min{B1, B2} < 1—-——~ 


log(qiq2) 
Proof. The theorem follows by a simple modification of the last part of 


the proof of Theorem [12.3] starting with (12.7). We leave the details as an 
exercise. 


We record two important corollaries of Theorem [12.5| both of which 
demonstrate the scarcity of Landau-Siegel zeroes. 


Corollary 12.6. Let x; (modq;) be a sequence of primitive characters of 
strictly increasing moduli such that L(8;, xj) = 0 for some 8; > 1—c/ log q;. 
If c is small enough, then qj+1 > qj for all j. 


124 12. The Prime Number Theorem for arithmetic progressions 


Corollary 12.7 (Page). There is an absolute constant c > 0 such that 
among all real, primitive characters of conductor < Q, there is at most one 
whose Dirichlet L-function has a real zero > 1— c/logQ. 


Corollary [12.7] known in the literature as Page’s theorem, does not ex- 
clude the possibility that there is one character y (mod q) with a zero at 1 
that would yield the very uneven distribution of primes described in (12.10). 
To show there are no such zeroes, we use a different argument. 


Let y (mod q) be a real, non-principal character such that L(3,x) = 0 
for some 6 < 1. We have 


1 
(12.11) L(1,x) = i L'(a,x)do « (1 — B)q-?)/2 log? q 
B 
by Lemma [11.2] When 6 > 1 — 1/loggq, this implies that 
(12.12) 1— 6 > L(1,x)/ log? ¢. 


Hence we could show that @ is not too close to 1 by proving a lower bound 
for L(1,x). A weak but uniform such bound follows. 


Theorem 12.8. If y is a non-principal real Dirichlet character mod q, then 
1 
L(1,x) > ——-. 
VGlog” q 
In particular, there is an absolute constant c > 0 such that 


c 
L(o,x) #0 when o >1- ——_. 
q'/? log* q 


Proof. The second part of the theorem follows readily from the first part 
and (12.12). Now, to bound L(1,x) from below we consider the function 
1* x. We note that 


m+1 if x(p) =1, 
(Le x)(p") = 4 lam if x(p) = -1, 
1 if yep) = 0. 


Using multiplicativity, we infer that] (1* y)(n) > 0 and (1 * x)(n?) > 1 for 
all n, whence >7,,<,(1 * x)(m) > _ On the other hand, we have 


S:= x (1 * x)( => x(a ¥(a) |e/al . 
n<u acx 
We thus see that the expected main term is 77, <, x(a)/a ~ rL(1, x) for 
large enough x, which should allow us to get a lower bound for L(1, x). 


1When x is a primitive character, (1 * x)(n) counts ideals of norm n in the ring of integers of 
the quadratic field Q(.\/0q), where 6 € {—1, +1} is an appropriate sign (see Chapter 7]). The 
inequality (1 * x)(n?) > 1 has a more conceptual proof in this context, since it is a consequence 
of the fact that there is always at least one ideal of norm n? (the principal ideal (n)). 


Exceptional characters 125 


However, it is hard to estimate S with a remainder term smaller than \/z, 
which is the size of our lower bound for S. To bypass this technical issue, 
we work with the smoothened sum 


b 
Oe 2), 
La *YeA=n/x) =D xl@ YO (1- TZ 
n<x axa b<a/a 
The Euler-Maclaurin summation formula (Theorem [1.10) 0) implies that 
1 x 
0 - | {t}dt. 
nee 2 Xr Jo 


Consequently, 


== Mo 2 [te S> ax(a)dt. 


acu a<min{x,x/t} 
Using the Pélya- Vinogradov inequality and partial summation, we find that 


x(a) . Valoga 
S ; < = and S“ax(a) < yVGlogq 


a>x agy 


uniformly for x,y > 1. Consequently, 


Sle xylnj(1~ n/a) = #29 5 of yatoga)(082)). 


On the other hand, 
So *x)(n)(l = n/2) > SO (= m?/2) > Ve. 


nN<x mca 


Comparing the above estimates when x = cq(logq)* for a large enough 
constant c completes the proof of the theorem. 


Theorems [12.4] and [12.8] establish Theorem uniformly for all moduli 
q < (log x) (log joe x)~*. In order to handle larger q, we need a strengthening 
of Theorem due to Siegel. 


Theorem 12.9 (Siegel). Let ¢ > 0. There is a constant c(e) > 0 such that 
for all real, primitive, non-principal Dirichlet characters y (mod q), we have 


L(o,x) #0 when o>1-c(le)q<, 


with the possible exception of one character x1 (mod q;). 


126 12. The Prime Number Theorem for arithmetic progressions 


Proof. We follow an argument due to Goldfeld [60]. Clearly, by taking 
c(e) < €/2, we may assume that there is at least one primitive, non-principal 
character whose L-function has a zero in [1 — ¢/2,1]. Let yi be such a 
character of minimal conductor q > 1. 

Now let x (modq) be a different real, primitive, non-principal Dirich- 
let character. If g < q,, then the claimed zero-free region follows by the 
minimality of g, so assume that q > q1. 

We argue similarly to Theorem [12.8] only this time we replace 1 * x by 
f =1*x*x1*xx1. This function is also non-negative; the easiest way to 
see this is by examining the logarithm of its Dirichlet seried?| 


F(s) = C(s)L(s, x1) L(s, x) L(s, xx). 


Let 6; be the rightmost zero of L(s,x1) in the interval [1 — ¢/2,1], let 
¢ € C™°(Rso) such that 1)9,1) < ¢ < 19.9}, and consider the auxiliary sum 


g= De a 


On the one hand, we have the trivial lower bound S$ > 1 by dropping 
all summands except for the one with n = 1. On the other hand, we can 
evaluate S' using Mellin inversion: Exercise [7.2(d) implies that 


S=Io, where Ig := F (s+ 81)x2°®(s)ds 


with ® denoting the Mellin transform of " Since F'(G1) = 0, the only pole 
of the integrand is at s = 1 — 61, which is a positive number by Theorem 
Shifting the contour to the line Re(s) = —1, we find that 


S =a) #1(1, x1) E(1, x) ECL, xx1)®(1 — 61) + E-. 


When s = —1+it, we have |F'(s+(1)| < max{gq, |t|}“ for some absolute 
constant c; > 0 by Exercise (the characters y, i and x1 all have 
conductor < qiq < q’). Since |®(—1 + it)| < 1/(1 +4 |t|)“*? by Exercise 
[7.2\c), we find that J_; = O(q*/zx). Taking x = coq“ for a large enough 
constant cz makes |J_4| < 1/2. Recalling that S > 1, we infer that 


cog OP L(A, x1) LCL, x) ECL, x1) ®(1 — B1) > 1/2. 


Next, we note that L(1,xv1) « logq by Lemma [11.2] ®(1 — 61) < 
i, y-Pidy < 2/(1 — 61), and L(1,v1) « (1- By)qh-81)/2 log? q by (12-11). 
Since we also have 1 — 6; < ¢/2, we conclude that 
L(1,x) > g (1 t1/2)0-B1) 7 Jog3 q>- g ere 


2This is the Dedekind function of the biquadratic field Q(/0q, /014q1), where 0, 01 are ap- 
propriate signs (see footnote[IJon page [124). 


Exercises 127 


Together with (12.11), this proves that L(o, y) 4 0 for o > 1—O.(q~(t*). 
Replacing € by ¢/(c; + 1) completes the proof. 


The possible exceptional character x; in Theorem [12.9] causes a sub- 
tle but significant problem: since we know nothing about it, we can at 
best use Theorem to say that L(o,.1) has no zeroes when a > 1 — 
O(1/(./@i log* qi)) for some c; > 0. Of course, gi here is a constant (the 
conductor of the hypothetical unique exceptional character x1), so there is 
some constant é = C(é,q,) such that L(o, x1) has no zeroes for o > 1—¢q;,°. 
However, since we have no control over q1, it is impossible to compute a spe- 
cific value of ¢. Notice that this is not due to a lack of computing power, but 
because of the argument producing ¢. We then say that “¢ cannot be com- 
puted effectively”. We have thus arrived at the ineffective form of Theorem 
[12:9] known as Siegel’s theorem. 


Theorem 12.10 (Siegel). Let ¢ > 0. There is a constant c(e) > 0 (that 
cannot be computed effectively) such that 


L(o,x) #0 when o >1-c(e)q 


for all real, non-principal Dirichlet characters x (mod q). 


Proof. We have already treated the primitive characters. The non-primitive 
characters are dealt with via formula (11.2). 


Combining Siegel’s theorem with Theorem completes the proof of 
Theorem[12.1] Note, however, that the ineffectivity of Theorem [12.10] trans- 
fers to the implicit constant in Theorem [[2.1] As a consequence, results 
proven using Theorem [12.1] are generally not amenable to numerical anal- 
ysis. There are some exceptions to this rule, as it is sometimes possible to 
isolate the influence of the exceptional character ; in Theorem [12.9} 


We will revisit exceptional characters in Chapters 22] and 27] 


Exercises 


Exercise 12.1. Adapt the argument of Exercise [8.4]to prove that there is a con- 
stant c > 0 such that for each fixed A > 0 we have 


S- uln)x(n) <a we~VE® (1 <q < (log), x (mod). 
nxn 
Exercise 12.2. Let x be a Dirichlet character mod q and x > q > 3. 
(a) Prove that there is an absolute constant c > 0 such that 


Br 
W(z, xX) = ly=xot — a # O(aerevnes + (log q)?at~°/ nen) ; 


128 12. The Prime Number Theorem for arithmetic progressions 


where the term x1 /8, is present only when x is the exceptional character from 
Theorem [12.3 


(b) Let ¢ and ® be as in Exercise [7.2] Prove that 
v(z, x) = ly=y ®(1) x _ x”! &(B1) =F O¢ (oene7Re + (log q)x'~¢/ log 4), 


where the term x°! (61) is present only when y is the exceptional character 
from Theorem [12.3] 


Exercise 12.3*({114) Lemma 18.4]). Let y be a real, non-principal character mod 
q, and let 6B € . 1] be a zero of L(s, x). 


(a) For x > y > 1, prove that 
1+ x(p) (L*x)(m)\ - (1 * x)(m) 
z ae 


(b) For N > q, prove that 


1 
Sy FEO) = a) ogN +7) + L(x) + O(@"/4N-¥? log N) 
n<N u 
using the hyperbola method. 
(c) If ¢ € C™(R) is such that 1)9,1/2] < ¢ < 1p, .; then show that 


1x 1 1x (n 1, 
> GP ewe) (extn) yo > xh penny) 


as long as y > q” and q is large saat depending only on ¢. 
(d) Deduce that 


n<y 


ye 1¥x) ~ (1— B)logx (x >y>q°). 


y<pRgu 


Exercise 12.4 (Alternative proof of a weak version of Theorem[12.3). Let x (mod q) 
be a Dirichlet character, t € Rand 7 = max{|t|,2}. Assume that either x is complex 
or |t| > 1. 


(a) Using the 3-4-1 inequality (8.6), prove that 
L(o, X0)?|L(o + it, x)|*|L(o + 2ét, x?)| > 1 foro >1. 
[Hint: Recall that log |z| = Re(log z).] 
(b) For o € (1, 2], show that |L(o + it, x)| > (o — 1)?/4/log'/?(qr). 
(c) For a,0’ > 1—1/log(qr), show that 
L(a' + it, x) = L(o + it, x) + O(\o’ — o| log?(qr)). 
Conclude that there is an constant c > 0 such 


|L(o + it, x)| >> 1/log’(qr) for o >1—c/log'*(qr). 


Paris 


Multiplicative 
functions and the 
anatomy of integers 


Chapter 13 


Primes and 
multiplicative functions 


There is a strong connection between the distribution of primes and the 
average behavior of multiplicative functions. Landau proved that the Prime 
Number Theorem is elementarily equivalent to the relation }7,<, (nm) = 
Ox-+00(x) (see Exercise 8.15). In a similar vein, the Riemann Hypothesis is 
equivalent to the bound | >),<, u(n)| < x1/2+01) as x —> 00 (see Exercise 
(8.6), whereas the Generalized Riemann Hypothesis amounts to showing the 
same estimate for the partial sums of zx for each character y (mod q). 


In order to understand better the interplay between primes and mul- 
tiplicative functions, we assume a more general point of view. Much of 
what we have done so far can be roughly described as follows: we are given 
an interesting arithmetic sequence indexed by primes, say f(2), f(3), f(5), 

.., and we want to understand its partial sums. To accomplish this, we 
consider a special generating function: the Dirichlet series >”, f(p)/p*. In 
certain fortuitous situations, this series is related to the logarithm of the 
Dirichlet series F'(s) = }>°°_, f(n)/n® of a “nice” multiplicative function f. 
Analyzing averages of f(n) thus gives us information on averages of f(p). 

We now explore the converse direction: assuming we have good bounds 
on ) <2 f(p), we seek estimates for >7,,<, f(m). We will accomplish this 
goal when the sequence f(p) is roughly constant on average. More precisely, 
we assume that there is some « € C and some parameter Q > 2 such that 


(13.1) S— f(p)logp = Ka + Oa(a/(logz)*) — (a > Q) 


pKa 


for each fixed A > 0. We think of « as being fixed and Q as varying. 


130 


Generalized divisor functions 131 


We must further impose a growth condition on f that prevents abnor- 
mally large values from ruling its partial sums. A simple way of achieving 
this is to assume that there is a fixed k € N such that 


(13.2) Lf ae 
We call such a function divisor-bounded. 


For instance, if f = w, then and hold with « = -1,k=1 
and Q = 2, whereas when f = yx for a non-principal character y (mod q), 
then « = 0, k = 1 and Q = exp{q*} by the Siegel-Walfisz theorem (Theo- 
rem[12.1). A more unusual but still natural example is given by the function 
fi(n) = p?(n)(—1)?5-), where w(n;5,1) = #{ pln : p = 1(mod5)}. In- 
deed, ji satisfies ([3.1) with « = 1/2 and Q = 2, whereas (18.2) holds with 
k = 1. We already have good predictions for the partial sums of yw and py, 
but what about the average behavior of 1? 


Generalized divisor functions 


The study of the above general class of functions f can be reduced to certain 
canonical representatives. These are generalizations of the combinatorially 
defined divisor functions 7,,, m € N. Recall that the Dirichlet series of tT, 
is ¢(s)™. We then define 7, for « € C to be the arithmetic function whose 
Dirichlet series is ¢(s)*. Using the Euler product representation of ¢ and 
the Taylor series expansion of (1 — x)~“ about x = 0, we find that 


(13.3) Te(p%) = @ a ‘) 


a 
In particular, 7,,(p) = «, so that (13.1) holds by the Prime Number Theorem. 


To relate a general function f satisfying to the function 7, we 
go back to the basics. Note that the average value of f(p) — 7,(p) is zero. 
Hence, if we write f = T, * g, then g(p) is zero on average. We might thus 
guess that g has small partial sums. Dirichlet’s hyperbola method would 
then suggest that f and 7, behave very similarly on average. 


A more analytic way to view the above argument is to consider F'(s), the 
Dirichlet series of f. We then roughly have F(s)¢(s)~" © exp{)/,(f(p) — 
k)/p*}. For this reason, is essentially equivalent to the function F'¢~* 
having a C’-extension to the half-plane Re(s) > 1 (see Lemma([I3.5/a) be- 
low). For simplicity, let us assume momentarily a stronger version of (13.3), 
with an error term of size O(x'~*). Then, we can analytically continue F¢~* 
to the half-plane Re(s) > 1—«. Hence, we see from that the partial 
sums of g = f * 7_, should indeed be rather small (there are no residue 
contributions to the right side of (5.14)). This allows us to relate averages 
of f and 7, via the hyperbola method. 


132 13. Primes and multiplicative functions 


Let us now study the partial sums of the prototypical function 7,.. We 
begin, as usual, by invoking Perron’s inversion formula: for any x ¢ Z and 
any a € (1,1+1/loga], we have 


s 
(13.4) So ne(n) = | ¢(s)* ds. 
Wee 201 (a) s 
When « is an integer, ¢“ has a meromorphic continuation to C, so the usual 
contour shifting argument can be used to estimate }7,<, T(n). However, 
when « ¢ Z, the function ¢” is only defined where log ¢(s) is. In particular, 
since ¢ has a pole at s = 1, we can only define ¢(s)" in a simply connected 
set of the complex plane that does not contain 1, nor any of the zeroes of ¢. 
There is no such domain containing a punctured disk centered at 1. Hence, it 
is not possible to employ Cauchy’s residue theorem to study the contribution 
of the singularity at s = 1 to the partial sums of 7,, which means that we 
must develop a new method to deal with the integral in (13.4). 


The LSD method 


The main idea for estimating the integral in goes back to work of 
Landau and was further developed by Selberg and Delange. Here, we present 
an adaptation of their technique that appeared in [68], which builds upon 
ideas in Section 2.4]. The original method of Landau-Selberg-Delange 
(called the LSD method for brevity) is presented in great detail in Chapter 
II.5 of Tenenbaum’s book [172]. We also outline it in Exercise [13.6 


For simplicity, assume that k > 1. Note that the integrand ¢(s)*x*/s 
blows up to co when s > 1. On the other hand, Exercise shows that 
¢(s)"a*/s < alt|-/? = O}t|+00(2) when s = o +it with 1 <o0 <1+1/logz. 
Thus, if we take a sufficiently close to 1, it seems reasonable to expect that 
most of the contribution to the integral in comes from s close to 1. 
For such s, we have that ¢(s)"/s ~ 1/(s — 1)*. This leads us to guess that 


ee) Dd 7(n) © - I, (s at 


NKx 


The right-hand side of the above formula can be computed using Lemma 
below, which is called Hankel’s formula. 


Lemma 13.1 (Hankel’s formula). Let x > 1, a >0 and Re(k) > 1. Then 


1 a (log )*—+ 


Qi (a) sh — T(x) 


Tf, in addition, a> 1, then we have 
1 


xe 1 . 4 
a ] . : 
Qi I, s(s— pe T(4) / (log y)" dy 


The LSD method 133 


Proof. We have {;*(logx)*~'a~*~'da = s-*I'(«) for Re(s) > 0. (Justify 
why this is true for all s with Re(s) > 0.) Since Re(«) > 1, the func- 
tion 1/s* is absolutely integrable on every vertical line Re(s) = a with 
a #0. Thus, the Mellin inversion formula (Theorem implies that 
(1/277) Sa) y®s "ds = lysi(log y)* +/I'(«) for any a > 0, which proves the 
first part of the lemma. Integrating over y € [0,2] proves the second part 
too with a+ 1 in place of a. 


The above discussion leads us to conjecture that 
K—1 


«(1 
(13.6) So tx (n) © (log 2) (2 + 00). 
Nu T(r) 
Note that this agrees with Theorem [7.4] when « € N. Remarkably, it also 
agrees with what we know for the M6bius function. Indeed, when « = —1, 


we have 7-1 = pl, for which we know that >7,,<, U(m) = Oryo0(x). On the 
other hand, the right-hand side of (13.6) vanishes because of the pole of the 
Gamma function at s = —1. 


The main goal of this chapter is to establish an appropriate version of 
(13.6) for all multiplicative functions f satisfying (13.1) and (13.2). Un- 
der the same assumptions, we will show that the asymptotic behavior of 
Den<a f(m) is determined by the analytic behavior of the Dirichlet series 
F(s) when s & 1. 

As we mentioned earlier, F'(s)¢(s)" admits a C°-extension to the half- 
plane Re(s) > 1 under the assumptions of (13.1) and (13.2). Thus, the same 
must be true for the function F(s)(s — 1)* because ¢(s)(s — 1) is analytic 
and non-zero in an open neighborhood of the plane Re(s) > 1. We then let 

d d/ (s —1)*F(s) 
—— —1)*F d G= oe 
ea Pe Ot a al 5 
be the Taylor coefficients about 1 of the functions (s — 1)*F(s) and (s — 
1)*F(s)/s, respectively. Since s = 1+(s—1) and 1/s =1—(s—1)4+(s— 
1)? +--+ for |s—1| <1, these coefficients are linked by the relations 


(13.7) Cj = 


j 
(13.8) G= So (-1)%*Gj-a and cj = ¢j + ¢j-1 for 7=0,1,... 
a=0 
with the convention that €_; = 0. Moreover, since ¢(s) ~ 1/(s — 1) as 
s—1* and f is multiplicative, we have that 


(13.9) conta [] (1+ 4 2 -)(1- 5)" 


We will prove in Lemma a) that cj,¢; <;,% (log Orr, 


With the above notation, our main theorem is the following. 


134 13. Primes and multiplicative functions 


Theorem 13.2. Fixe >0 and JEN. If f, cj and cj are as above, then 


= 
oo) ae ele Qe 
(13.10) a n= a i Te py 0 eS ata 
ae ae ae a(log Q)**t771 
(13.11) Ta (kj — +0( see at 


forx> e(logQ)'** | The implied constants depend at most on k, J, « and the 
implied constant in (13.1) for A large enough in terms of k, J and e. 


Remark 13.3. Note that when «& € Z<o, then all the main terms in (13.10) 


vanish because of the poles of the Gamma function at 0,—1,—2,.... Hence, 
for each fixed « > 0 and A > 0, we infer that 
(13,12) S> f(n) <a. a/(log a)4 (a> ellos Qyr*) 

Nx 


On the other hand, when co 4 0 and k ¢ Zo, we see that )),<, f(m) is 
much larger, of size x(log x)Re()—1, 


For example, for the function ji(n) = p?(n)(—1)%" > that we saw 
above, we have k = 1/2 and co > 0. Hence, >/,,<, fu(7) is of size x/(log x)/?, 
This might be a bit surprising because it is in stark contradiction with a 
common heuristic argument for the Riemann Hypothesis. 


Indeed, given a square-free integer n, note that f(n) = 1 when w(n; 5, 1) 
is even, while fi(n) = —1 when w(n;5,1) is odd. A similar situation is true 
for the Mobius function, with w(n;5,1) replaced by w(n). Since there is no 
reason to suspect any bias for the parity of the functions w(n; 5,1) and w(n), 
we may be tempted to model yp and fi by a sequence of random, independent 
and equiprobable assignments of +1 or —1 to each square-free integer. The 
Central Limit Theorem would then predict that | >7,<, u(n)| < a rete 
and | on<2 (n)| < x1/2+0(1), While the former estimate is believed to be 
true in virtue of the Riemann Hypothesis, the second one is very far from 
the truth. 


In conclusion, we should be very careful when using probabilistic argu- 
ments of the above sort to analyze partial sums of multiplicative functions, 
because their values are interdependent in a fundamental way. For instance, 
if n is odd, then we always have that f(2n) = f(2)f(n), which means that 
the values f(2n) and f(n) are highly correlated. 


Before going on to prove Theorem [13.2] we record an important conse- 
quence of it. 


The LSD method 135 


Corollary 13.4. Fir A,C > 1 ande € (0,1/2]. Let x > 2 andq,m EN 
with q < (logx)° and w(m) < exp{(logx)!~*}. Then 


Ss (n)< oa 


nX<a, (nym)=1 
n=a (mod q) 


Proof. Let d = (a,q) and write a = da; and q = dq. If n = a(modq), 
then d|n. Hence, for the sum in the statement of the corollary to have any 
terms, we must have (d,m) = 1. In this case, we write n = dr, so that 


So ow) = So pwldr)= nd) Spr). 


n<a, (njm)=1 r<a/d, (rym)=1 r<a/d, (r,dm)=1 
n=a (mod q) r=ay1 (mod q1) r=ay (mod q1) 
Since (a1,q1) = 1, we may expand the condition r = a; (modq) using 
Dirichlet characters. We thus find that 
p(d) = 
(13.13) > wn) = an d) Xl) SS wr) x(r). 
n<a, (nym)=1 PAG x (mod q1) r<a/d 
n=a (mod q) (r,dm)=1 


Fix y(modq,) and note that «/d > 2/q > x/(logx)°. We shall apply 
Theorem with f(n) = la@mam)=16(")x(n). For this function, Theorem 
implies that 


S— f(p)logp = — S© x(p) log p + O(w(dm) log w) 
pgw pw 


= —ly=yow + Ou (we 8” + w(dm) log w) 


for all w > exp(qq! ey where M > 0 is arbitrarily large but fixed, and c 
is an absolute positive constant. Note that w(d) < logq for d|q. Hence, 
taking M = (1+ 6)C yields with parameters & = —l,y-y, and 
log Q = max {q/(A+)°) (log w(m))/G- ©’). Moreover, (13.2) clearly holds 
with k = 1. Notice that our assumptions on x, m and q a that 

loga > max{q'/©, (logw(m))'/C-9} = (log Q)"**. 


Consequently, Theorem implies that 


S > wl(r)x(r) « 


r<a/d, (r,dm)=1 


ela Jam 
(log(a/d))" ~ (log x)# 


(all the main terms vanish because either & = 0 or & = —1 here, whence 
I(x — 7) = & for all 7 € Zso). Inserting the above estimate into (13.13) 
completes the proof. 


136 13. Primes and multiplicative functions 


Estimating Perron integrals without shifting contours 


We now turn to the proof of Theorem [13.2] The argument leading to 
can be made rigorous using the methods of Chapter [7] at least when 
Re(k) > 1. However, it cannot produce an approximation for }),<, Tx (7) 
that is strong enough to detect all the lower order terms in the asymptotic 
estimation of )>) —.7.(m). To prove Theorem [13.2] we need one additional 
idea. 


n<a 7 


Instead of estimating )¢,,<,7(m), we work with the weighted average 
en<e T(2)(logn)™, where m is a fixed integer at our disposal. It is easy 
to go back and forth between these two sums using partial summation. 
pate the Dirichlet series of 7(n)(logn)” is (—1)™(¢")°)(s), where 
(¢*)(™ denotes the mth derivative of ¢*. Hence 


ee ee K\(m) x 
13.14 T.(n) (log n)"” = ———— s)—ds 
(13.14) > 8 | yn'(s) 


271 Ss 
Nx 


for x ¢ Z and any a > 1. Using Exercise [8.4(c), it is possible to show 
that (¢*)°™)(s)/s <m |t|~!/? for o > 1 and |t| > 1, which tends to 0 when 
|t| + oo, no matter how large m is. On the other hand, for s close to 1, we 
have (C*)™(s)/s ~ (-1)™K(« + 1)---(«# +m — 1)/(s — 1)**+™. Choosing 
m large enough ensures that our integrand is much bigger for small |t| than 
for large |t|. This allows for a much better estimation of the integral on the 
right-hand side of (13.14). We provide the necessary details below. 

We begin with an auxiliary result. We postpone its proof till the end 
of the chapter because it is rather technical in the general case, while being 
easy in the prototypical and important case when f = 7 for which 
holds with Q = e: the analyticity and the non-vanishing of ¢(s)(s — 1) when 
Re(s) = 1 yields parts (a), (b) and (d), whereas Exercise[8.4] yields part (c). 


Lemma 13.5. Let f and c; be as in the statement of Theorem |13.2) and 
let F be the Dirichlet series of f. All implied constants might depend on k 
and the implicit constants in (13.1). 


(a) F(s)(s—1)" has a C®-extension to the half-plane Re(s) > 1. 
(b) For 7 =0,1,2,..., we have c; <; (log Q)9*?*. 
(c) Form € Zso and e > 0, we have 

F")(8) Kme |t|° + (log Qy"*** for o > 1, |t| > 1/logQ. 
(d) Form, J € Zso and |s —1| < 2/logQ with o > 1, we have 

I'(« — 7 +m) ys 
—1)™F™)(s) = Gj —(s—1)/°"* 
yPMs)= Daag eo) 
<j<J 


4 Om, 3 ((log 2) ae _ {jo Bel, 


Estimating Perron integrals without shifting contours 137 


Proof of Theorem [13.2] All implied constants might depend on k, J 
and «. In addition, they might depend on the size of «. However, note 
that for and both to hold, we must have |x| < k. So the 
dependence on « can be absorbed into the dependence on k. The implied 
constants will also depend on an integer m we will choose later in terms of 
k, J ande. 

Instead of estimating the partial sums of f, we work with the function 
f(n)(logn)™. We will use a smooth variant of Perron inversion to rewrite 
the partial sums of this function. Let 


(13.15) T=(loga)?**7*1 and w(s)=T- [(1+1/T)*t! — 1] /(s +1). 
Then (7.3) implies that 


(13.16) So f(n)(logn 


Nex 


er - (=) 


- FO) us deck 
270 Focverree ( ) ( ) 8 


where || < Yoeen<e+a/T |f(m)|(logn)™. Since | f| <7, and T is a power of 
log x, Theorem [7.4] implies that 


(13.17) |R| < a(loga)™t* 1 /T < x(logx)™—*-1-Y, 


Next, we turn to the main term in (13.16), which we write as  +1,+Is 
with J, denoting the portion of the integral with | Im(s)| < 1/ log Q, Iz being 
the portion with 1/logQ < |Im(s)| < T?, and J3 being the remaining part. 

First, we bound J3. For s = 1+1/log« + it, we note that |F“”)(s)| < 
den>i If (n)| (log n)™/ni+1/les@, Using our assumption that |f| < 7% and 
Theorem[7.4] (that we insert via partial summation), we infer that F()(s) < 
(logv)™+t*. In addition, we have |w(s)| < T’/|t|. As a consequence, 


(13.18) Iz < x(logx)™** /T = x(log x)” 7—*1 


by the choice of T. 


To bound Jn, we note that if 1/logQ < |t| < T? and m is large enough, 
then w(s) < T and F(™(s) « |t|!/? + (log Qy”t* « (log x)™-7-*-1/T8 
by Lemma b), since log x > (log Q)'**. Consequently, 


(13.19) In < x(log x)™—-7-*. 


It remains to estimate J;. We use Lemma to find that 


Tia-j+m) ¢ ‘ ac 

= : aj -8 eg 

: 2 T(k—j) Qn ee ) a 
0<j<J 


a O(a(log (a) ea | 


|s _ i eay) , 
|t|<1/logQ 


138 13. Primes and multiplicative functions 


where £ denotes the vertical line segment [1+1/log x—i/ log Q,1+1/loga+ 
i/log Q]. For s € £ and j € ZN (0, J], we have w(s) = 1+ O(1/T) and 


(s _ ij < \s _ {je -e B) = (log gyre’ 


Consequently, 


where 
x(log g)™tRe(*)—J (log Qs. 


Assuming that m>J+2+k>J+2+ |k|, we have 


a 


ee m+Re(K)—j—1 rar 
Re(s)=1+1/ log x s(s STE 7ds < x(log Q) (0 SJ< J) 
|t|>1/ log Q 


Thus, Lemma implies that 


1 . ® (log yymts-I-1 er 
_ 1 m+Re(K)—j 
mei Jp atesayrPTE fy Nom een gy vr) : 


so that 
ie | P(logy)(logy)"dy +O(E) with = P(w) = 7 2 
2 
Combining this formula with (13.16)—(13.19), we deduce that 


>= f(n) (log n)™ = f° Poogy)(logy)yay + OC ). 


NKx 


Finally, we remove the weight (logn)” with a simple partial summation 
argument to establish (13.10). Relation (18.11) then follows by expanding 
fe (og y)?-1/T'(B))dy into an asymptotic series using integration by parts 
several times, much like we did in Example [1.6] 


Proof of Lemma (a) The function ¢(s)(s — 1) is analytic and non- 
zero in an open neighborhood of the half-plane Re(s) > 1. Hence, it suffices 
to show that F'¢~“ has a C™-extension to the half-plane Re(s) > 1. We 
write F'¢-“ = GH, where 


s 2 2s yas 
a= [[a-wey2@ and H(s) = II sas ioe ae - 
Pp Pp 


The factors of H(s) are 1 + O(1/p??) by Taylor’s theorem and (13.2). In 
particular, H(s) is analytic for o > 1/2 and each derivative H’”)(s) is 


Estimating Perron integrals without shifting contours 139 


uniformly bounded in the half-plane o > 1. To establish the C'°-extension 
of G, it is more convenient to work with its ere for which we have 


(13.20) (log G(s) = > so! = ae (fe) =h), 


p, a21 


The series on the right-hand side converges uniformly in compact subsets of 
the half-plane Re(s) > 1 by (£3.1) and partial summation (see the proof of 
Theorem [4.5). This completes the proof of part (a). 


(b,c) By the above discussion, both parts will follow if we show that 
(13.21) Gl (s) <m max{|t|®,logQ}"t?* fora >1,t ER. 


Indeed, all derivatives of ¢"(s)(s — 1)* are bounded in the vicinity of s = 1. 
Together with (13.21), this yields part (b). To prove (c), we separate two 
cases. When 1/logQ < |t| < 1, we note that (¢*)(™(s) <m (log Q)™**, 
whereas when |t| > 1 we use the bound (¢*)")(s) < |t|®, which is a conse- 
quence of Exercise [8.4{c). Together with (13.21), these estimates establish 
part (c) in all cases. 

Let us now prove (13.21). We write G = e” and note that G’ = L’e” and 
G" = L"e’ +(L')e". In general, G'™ is a finite linear combination of terms 
of the form 1°)... L° )e with my+:+-+mp = mand my,...,mr € Zs1. 
This reduces to proving that 


(13.22) |L(s)| < 2kloglogN +0.(1), L'™(s) me (log NY™ (m>1) 
for o > 1 and He where N = exp(max{|t|*, log Q}). 


To prove (13.22), we adapt the proof of Theorem [I1.2} we fix A > 
max{m,e—'}, a use partial summation and (13.1) to find that 


= pirat m 
yo tog PY” <m.a (1+ [t\/(log.N)4) (log. NJ" < 2(log.1V)" 
p>N P 
Since | f(p) — k+ oe 2k by (13.2), we also trivially have that 


by —log p)™ |< ‘ee if m = 0, 
pen O((log N’)™) itm > 1, 
as well as 


p, a>2 


This completes the proof (13.22), and hence of (13.21). 
(d) Taylor’s theorem implies that 


(13.23) F(s) := F(s)(s—1)* = » cj(s —1)) + E(s), 


0<j<J 


140 13. Primes and multiplicative functions 


where the remainder can be written as 
E(s) = ie Fo pcos a 
‘ =i) 
Dividing both sides of by (s — 1)* and differentiating m times, we 
see that part (d) will follow if we can show that 


(13.24) BO aga 1 ee a ee oa 
when |s — 1| < 2/logQ and o > 1. Indeed, by induction on ¢, we have 
0s) = {é FV)(z)(s — z)I-!"dz/(J -1- 8)! if (a7 =1, 
FO(s) iff SJ. 
Since F)(s) <n (log Q)"+?* for z € [1,5] by (13.21), we find that 
E((s)(s —1)' < (|s — 1] log Q) "4°"? log Q)* < |s — 1|"(log Q)7" 
for |s—1| < 2/log Q. This shows (13.24), and thus part (d) of the lemma. 


Exercises 
Exercise 13.1. Let « € C. Estimate )7,<, Ke") and Vice KY) a(n). 


Exercise 13.2. Fix « > 0 and A > 1. Prove that, uniformly for m € N and 
x >2+exp{(logw(m))!**}, we have 


HeeeGi@le-2)] (ins) +0a( aoa): 


Exercise 13.3. Fix « € C and e > 0. Given m €N, let L(m) = log(2 + w(m)). 
Uniformly for m € N and x > exp{L(m)'**}, prove that 


» Kelr) — ole”) (log a)*—) + On e(eL(m)?!*l (log x)Rel*)—?), 
K 
nXa,(nym)=1 


where c, = [],(1 + *5*)(1—1/p)*7? and fim) = T]pim(1 + «/(p — 1))7*. 


Exercise 13.4 (Landau). Let b(n) be the indicator function of those n € N that 
can be written as the sum of two squares. Prove that there is a constant c > 0 such 
that Yenc, b(n) ~ cx/Jlogx as x — oo. [Hint: b(n) = 1 if and only if v is even 
whenever p’||n with p = 3 (mod 4).| 


Exercise 13.5. For r > 0 and € > 0, we let C,(e) be the contour {|s]| = r : 
|arg(s)| <m—e} traced counterclockwise. We then define the contour 
H,(€) = (—o0 — irsine,re~"*-©] + C,(e) + [re™-©), -0o + irsin(e)). 


The limit of H,(€) when e > 0* is denoted by H,. and is called a Hankel contour 
(see Figure [I3.1). By convention, fj, = lim,,o+ Sotpte)" Prove that 


~ | Oden 1 
Qri Jy, 8% T(kK) 


[Hint: Show that both sides are entire functions of &.] 


Exercises 141 


Figure 13.1. A Hankel contour 


Exercise 13.6* ({162], [[72] Chapter II.5]). Fix « €C. Let « >3,a=1+1/logz 
and T € [100, ev'°8*], and define w(s) by (13.15). 
(a) Prove that 


> ta) a a SSW) sas +0, (ee). 


ae | Im(s)|<T? 
(b) Let 6 = 0.5c,/log(2 + T) with c, as in Exercise [8.4] In addition, consider 
r € (0,0) and let H’. denote the truncated Hankel contour that goes from 
—§ —i0* to —r —i0*, then traces a circle of radius r to —r + 707, and finally 
goes to —d +707. Prove that 


i 1 
> T.(n) = a : et ) atl + R, 
2 HI. st+ 1 


nxn 
where |R| < a(logx)°(1/T + 2°). [Hint: After making the change of 
variables s + s+1 in part (a), replace the contour [1/log z—iT?, 1/logx+iT?| 
by the contour of Figure [13.2] 
Develop ¢(s + 1)*s"/(s +1) into Taylor series about s = 0 to give a new proof 
of Theorem [13.2] when f = T,. 


(c 


NYS 


Exercise 13.7% 


(a) When o € (0,1), show that log ¢(a + ie) ~ log |¢(c) 
First, analyze log ¢(s) when s ~ 1.] 
(b) Use Exercise [I3.6{b) to show there is a constant c > 0 such that 


1 / 
S> t12(n) -=| Ilo)" ve a? do + O(xe~°vE*) (a > 2). 
1 


j2 9 


Fin ase — 0+. [Hint: 


NKu 


(c) Assume the Riemann Hypothesis and fix ¢ > 0. Show that 
1/2 
S- T1/2(n) = i (SCA ) a? do + O-(a'/2+®) (a > 2). 
nea 1/2 o 


Exercise 13.8* (A partial converse to Theorem [13.2] [120]). Let f be a multiplica- 
tive function such that |f| < 1 and 


(13.25) So f(n) «Ka a/(logr)* — (a > 2) 


nKx 


142 13. Primes and multiplicative functions 


—~6 +47? 1/log a + iT? 


—§ —iT? 1/logx — iT? 


Figure 13.2. Deforming the contour [1/log x — iT?,1/log x + iT?] 


for all A > 0. Assume further that there is some 6 > 0 such that 


y- Rel >(-146)loglogr+Oxp(1) — (|t| < (logz)?, « > 2) 


PKu 
for each fixed B > 0. Then prove that 


SF )logp Kc x/(logz)? (> 2) 

pKu 
for each fixed C > 0. [Hint: Let F(s)= S07, f(n)/n*. Show that 7, L{pitifese 
= O(1) and Dc, (p'/'°&* — 1)/p = O(1), and thus |F(1 + 1/loga + it)| = 
exp{) p<2 Re(f(p)p~")/p} > (log x)? for |t| < (log a)”. Conclude that |F(1+ 
1/log a + it)| >p (logx)>—! for |t| < (log)? and hence (F’/F)°™(1 + 1/log a + 
it) < (log x)™/? for large m € N.] 


Chapter 14 


Evolution of sums of 
multiplicative functions 


The LSD method allows us to handle partial sums of multiplicative func- 
tions whose prime values are very regular. However, the information we have 
at our disposal is often more limited. In this chapter, we develop a tech- 
nique that allows us to get a hold on partial sums of multiplicative functions 
under much weaker conditions. We mainly focus on non-negative functions, 
as they are easier to handle while still being a large enough class. For more 
advanced topics, see [38) Chapters 6 and 9], [75] and [172 Chapter III.4]. 


The underlying principle of the method we will use is very simple: if 
we know the average behavior of f over integers n < 2/2, and over prime 
powers p* < x, then we also know the average behavior of f over integers 
m <a. Indeed, any integer m < x can be written as m = p*n, with p*’ <a 
and n an integer < a«/p* < «2/2 that is coprime to p, in which case we also 
have f(m) = f(n)f(p*). This simple fact should in principle imply that 
S(x) = Yonex f(m) obeys a recurrence relation involving the quantities S(y) 
with y < 2/2 and the numbers f(p*) with p* < 2z. 


An elegant way to derive the claimed recurrence begins with the obvious 
identity F’ = (F’/F)- F, where F is the Dirichlet series of f. Hence 


(14.1) flog = Ay * f, 


where A, is the arithmetic function associated to the Dirichlet series —F"/F. 
This function generalizes von Mangoldt’s function that satisfies (14.1) with 
f =1. For instance, the definition of Ay readily implies that 


Ay(p) = f(p) log p. 


143 


144 14. Evolution of sums of multiplicative functions 


In addition, similarly to A, the function Ay is supported on prime powers 
(see Exercise [4.7). 


Example 14.1. (a) Let « € C. If 7, is the function defined in (13.3), then 
F(s) = ¢(s)*, whence F”/F = «¢’/¢. Consequently, Az = KA. 
(b) If f = p?x”, then we have F(s) = I], + «/p’), whence log F(s) = 


ts Yop (—L)™ 1 /(mp™). We infer that Af(p™) = (—1)™"!«™ log p, 
which grows exponentially fast in m. 


Now, using (14.1), we arrive at the formula 
(14.2) Y F(n)logn = As(a) SFO). 


nX<a axa b<a/a 


The slow growth of the logarithm implies that the left-hand side is 
© (log 2) ince f(m). Hence, allows us to write S(x) = \on<, f(n) as 
a weighted average of S(y) with y < 2/2, where the weight is controlled by 
the values f(p") with p* < x. This establishes the claimed recurrence for 
the partial sums of any multiplicative function. We give two applications of 
this fundamental principle of multiplicative functions in Theorems and 
14.3 


Before we continue, we need to make some technical preparation. We 
face the problem that Ay can grow very rapidly even if f is divisor-bounded 
(ie., |f| < Te for some k > 0). For instance, when f = p?K” with |«| > 1, 
Example [[4.1{b) shows that A;(p™) grows exponentially fast in m. On 
the other hand, we saw in Example [[4.I{a) that Ay is very tame when 
f = Tx. Motivated by these observations, we introduce the function Tf, 
defined as the arithmetic function whose formal Dirichlet series is given by 
f= 1/p*)~). In other words, ry is the multiplicative function with 


m f(p) in 1 
14.3 = 
(14.3) rtp) = (19% 
for all prime powers p”. 


Working with ty alleviates various technicalities. For instance, we im- 
mediately see that its Dirichlet inverse equals 7_¢, and that A,, is given by 
the simple formula A,,(p™) = f(p) log p for all prime powers p”. Moreover, 
we can easily relate f and t¢ with a simple convolution trick: if we write 


f=Ty*ry, 


then the function rf is supported on square-full integers. Since these num- 
bers are very sparse (see Exercise [L.6), ry often satisfies the inequality 


(14.4) S iml<e* @o7) 


NKx 


14. Evolution of sums of multiplicative functions 145 


for some fixed 6 > 0. If, for instance, |f| < 7, as in Chapter then 
Ir¢| = |f * 7 | < 72x, whence (14.4) holds for any 6 < 1/2 (see Exercises 
2.9(f) and[L.6{b)). Exercise establishes (14.4) in many more cases. 


Assuming (14.4), we use Dirichlet’s hyperbola method to find that 


(14.5) do f(a) = Sorel) SO 7) +B 


n<ax axy b<a/a 
for all x > y > 1, where 


R= FS reo? NS 


b<a/y ae b<a/y b<a/y 


If we extend the summation to all integers 6 all of whose prime factors are 
< x, we arrive at the bound 


x T¢\(P) . Tf)(?) 
r«< SJ (1+ 4+ 1% +) 


Pp Pp 


-|F()| 
(14.6) a (1 = *) 


where we used Taylor’s theorem to obtain the last equality. If we also know 
that |f| <7, then R <, ry~*(log x)* (and we can take any 6 < 1/2, as we 
discussed above). Exercise establishes similar results when f satisfies 
weaker versions of the growth inequality | f| < T,. 


Our first application of is a general purpose upper bound for the 
partial sums of non-negative multiplicative functions. What we will demon- 
strate is that, under some mild conditions, the mean value x7! S>,,<, f(n) 
is controlled by the logarithmic mean value (logx)~' > ,,<, f(n)/n. This is 
a rather special property of multiplicative functions (cf. Exercise [L.8b)). 

Notice that Theorem [[4.2] below is sharp in the generality in which it is 
stated, since taking f = 7 makes both sides of size =<, x(log x)*~!. Notice 
also that Theorem [I4.2]can be used to prove upper bounds when sieving the 
integers n < x with a set of primes P (see Exercise (14.4). 


Theorem 14.2. [f f is a multiplicative function such that 0 < f < T,, then 


SIO) <pe exp{ Ae “\ 


Nx pKx 


Proof. We first prove the theorem in the special case when tr = f. In 
particular, we have As(p™) = f(p) logp, so that Ay < kA. Together with 


146 14. Evolution of sums of multiplicative functions 


and Chebyshev’s estimate, this implies that 
S > f(n)logn << k= f(a) So A(b) < ka S> LO) 
NKx axx b<a/a axx 


We could now use partial summation to remove log n from the leftmost sum. 
More simply, note that 


(log) S> f(n) = d° f(n)logn +S) f(n) log(a/n) 


NKx NKx nKx 


< So f(n)logn + S> f(n) 


N<x N<x 


x f(n 
eB I(r) Ke log x a we 


Nex 


To complete the proof, we use the idea leading to (14.6): we have that 


pie eM) 10-1)" 


NK“ p\n => p<ax p<a \m=0 pSx 


whence 


IN 


Since (1 —t)“! < ett? for bE [0, 1/2] by Taylor’s theorem applied to the 
function log(1 — t), and f(p) < k for all p by assumption, we conclude that 


f(n) 2ck f(p 
(14.7) ye <e xp { ye a 
NKx pKx 
where c = )), 1/ p’. Mertens’ second estimate (Theorem B.4(b)) then com- 
pletes the proof when f = Tr. 


Finally, we consider the general case. As we discussed above, (14.4) 
holds for any 6 < 1/2 when |f| < 7%. Moreover, the remainder term R in 
(145) satisfies the bound R <;, xy~*(logx)*. Taking y = \/x implies that 


Yo f(r) = S20 re(a) S274 (6) + Op(x/ log x) 


n<u axVJ/z b<a/a 
Irs@l f(p)-1 x 
ae 2 a Dy ee 
axV/x p<a/a 


The term z/ log x is < rexp{)/,<2(f(p)—1)/p} because f > 0 here. For the 
sum over a, note that Merten’s second estimate implies that >, jocwce’ p= 


O(1) when a < Vx. Moreover, S°>, |rr(a)|/a converges by and 


partial Aaaton. The claimed estimate on >>, —,. f(m) thus follows. 


Nx 


Our second application of (14.2) is a result due to Wirsing [186)[187) that 
should be compared with ener - 2| The idea underlying its proof is 
that if f(p) © « on average, then 2) implies that >.<, f(n) satisfies an 


14. Evolution of sums of multiplicative functions 147 


approximate differential equation. For technical reasons, it is much easier 
to work with logarithmic averages. 


Theorem 14.3. Fiz k > 0, c€ [0,1) andk& € C, and consider a multiplica- 
tive function f such that |f| < Tp, 


(14.8) f(p) logp = klogx + O(1) (eo 2) 
and 
(14.9) > IF)! el) < clog log x + O(1) (oS 2). 


We then have 


De am = - (log x)* + O(log x) +°1) (> 2), 


where G(f) = T],(1 — 1/p)*(1 + f(p)/p + f(p?)/p? +--+). The implied 


constant eg at most on k, the distance of c from 1, and the implied 


constants in and (14.9). 


Proof. As in Theorem [14.2 it suffices to consider the case when f = Tf. 
Let S(x) := Yince f(n)/n and note that 


f(n) logn f(m Ag(a 
y Aken _ yy fy ia) 


n<ew m<ew age /m 
(14.10) = FO) ( Jog(e!"m) + O(1)) 


for all w > 1. In addition, 
se sae 
A og(erym) = Sy AM fa fl May 
m<ew m<ew m m Y 1 ¥y 
by interchanging the order of summation and integration. Lastly, note that 


S- fm) < S- Lm) < we(e)+e (w > 1) 


m<ev m<ew 


by (44.7), since 7 ,<2 |f(p)|/p < (Re(«) + ¢) log log x + O(1) by (4.8) and 
(14.9). Consequently, 


(14.11) 


(14.12) S(e”) = “fo Way + O(wke)+¢) (ww > 1). 


148 14. Evolution of sums of multiplicative functions 


On 2 other a partial summation implies that 7), <ew f(n)(logn)/n = 
ey ae y)/y)dy. Thus 


wS(e”) = (K+ 1) [ Way + O(wel*)+e) (ur > 1). 


We bound the part of the integral over y € [1,e] trivially by < 1. In 
addition, we let y = e“ and 
ole") =a" g(a): 


Hence, we arrive at the formula 
WwW 
w'**g(w) = («+ yf u®g(u)du + O(wRe+*) — (w > 1). 
1 


Notice that if we had an exact equality w**'g(w) = (K +1) {7° u%g(u)du 
and g were a differentiable function, then we would immediately infer that 
g'(u) = 0, that is to say, g is a constant function. We will give an asymptotic 
version of this argument. 


Let 
K+1 /[" , 
(14.13) E(w) = g(w) - a u*g(u)du, 
so that 
(14.14) E(w) = O(w*") (w > 1). 


We multiply E(w) by 1/w and integrate over w € [1, z] to find that 


2B z z z 1 
/ aw = | MO) aw — f usg(u) | ide 
1 w 1 wv 1 ee 
1 a 


Together with (14.13), this implies that 
ss d 
(14.15) (A=TOaies yf E(w). 
1 


By (14.14) and our assumption that c < 1, the integral on the right-hand 
side of (14.15) converges and its tails are < z°!. Hence 


(14.16) g(z) =A+ O(2*) with A:= («+1) [- Bw), 


Taking z = log x completes the proof of the theorem, as long as we can show 
that A = 6(f)/T'(«k + 1). To do this, we compute G(f) in two ways. 


Let F' be the Dirichlet series attached to f and note that 
G(f) = lim, F(o)¢(o)-". 
ovlt 


Delay differential equations 149 


Since ¢(a) ~ 1/(o — 1) when o > 1*, we can rewrite the above relation as 


(14.17) 6(f) = lim F(c)(o — 1)". 
ait 
In addition, partial summation and (14.16) imply that 
Fi(o (a = 1) fs S(e (o— udu 
= (0-1) fut + o(uRO+t 4 1))e“ Mau 
0 


= (A(« +1) + O((e -1)'~))(o - 1)" 


Since c < 1, we conclude that lim,_,,;+ F(a)(a—1)* = AP (K+1). Comparing 
this relation to (14.17) yields our claim that \ = G(f)/T(« + 1). 


Delay differential equations 


Wirsing’s theorem capitalizes on the idea that the evolution of the function 
S(x) = Yon<x f(n) is controlled by a differential equation, a consequence of 
(14.2). In fact, an important aspect of is that, since Af(1) = 0, the 
right-hand side only involves values of S(t) with t < 2/2. There is thus a 
certain delay on the right-hand side of (14.2). This feature is amplified if 
we consider functions f that are supported on integers free of prime factors 
< y, since then the right-hand side of only involves values of S(t) 
with t < x/y. The simplest such example is the indicator function of y- 
rough integers, namely integers all of whose prime factors are > y. We 
denote their summatory function by 


O29) = Fina. P (na) Sy}, 


where we recall that P~(n) is the smallest prime factor of n with the con- 
vention that P~(1) = oo. 

The function ®(z,y) is closely related to the sieve of Eratosthenes- 
Legendre. In particular, when y = \/z, we see that ®(z,,/xz) ~ x/logx 
by the Prime Number Theorem. On the other extreme, Theorem [2.1] with 
m = |I,<yP implies that ®(x,y) ~ e~7x/logy when y tends to infinity at 
a rate such that y < logz, since then w(m) < m(y) < loga/loglogz. We 
want to fill in the gap and understand how ®(z, y) evolves when y goes from 


Vz to log a. 
Using (14.2) with f(nm) = 1p-(m)sy, for which Aj(p*) = 1,5, logp, we 
find that 


(14.18) . logn = SoS) @ (a/p*,y) log p. 


n<a,P~(n)>y pk<ax,p>y 


150 14. Evolution of sums of multiplicative functions 


The left-hand side should be roughly ®(x,y) log, while prime powers p* 
with k > 2 should not contribute significantly to the right-hand side. We 
should thus have 


(log x)®(a,y) ~ S> &(a/p,y) log p. 
Y<pKx 


Note that ®(2/p,y) = 1 when p > 2/y, since the only integer < x/p < y 
free of prime factors < y is the number 1. Consequently, 


(14.19) (log x)®(x,y) a+ ss ®(x/p, y) log p. 
y<psa/y 


If we pretend for a moment that « — ©®(z,y) is a continuously differen- 
tiable function, the Prime Number Theorem suggests that the sum over p is 
~) Fi ®(x/w,y)dw. Letting x/w = y! and u = logxz/ logy, we find that 


Se Dg) 
14.20 log x) ®(ax, y) o+ef ———dt 
(14.20) (log) (x,y) a 
This relation suggests that there is a function B such that 
B 
(14.21) O(xz,y) ~ ea) (2=y", we 1, y— oo). 


log y 


For consistency with the estimate ®(z, OB ~ ences when /@ < y < 
x/log«, and with the recursive relation , we must have that 


(14.22) Bu)=- (l<u<2), uBiu)=14+ [ " B(v)do (uw 2 2). 
1 


The above relations together define a unique continuous function B : Rs; > 
R called Buchstab’s function. It is usually denoted by w, but we use the 
letter B here to avoid confusion with the arithmetic function w(n). 

For consistency with Theorem[2.]jand Mertens’ third estimate, we should 
have that lim,+.. B(u) = e~7. Exercises and give two ways of 
proving this guess rigorously. For now, we note that 1/u < B(u) < 1 for 
u > 1, as can be seen by and induction on |u|. Moreover, B is differ- 
entiable in (1,2) U (2,+00) and its derivative satisfies the delay differential 
equation 

uB'(u) = B(u-—1)— B(u) for u> 2. 
As we will see again later on, the solutions to delay differential equations 
rule the asymptotic behavior of various sieve-theoretic functions. 


We now prove that (14.21) is indeed true. 
Theorem 14.4. Fizu>1. Ifa =y", then 


Delay differential equations 151 


Proof. We argue by induction on |u|}. When 1 < u < 2, the theorem follows 
by the Prime Number Theorem. Assume now its validity when u < N for 
some N € Zs, and consider u € (N,N + 1]. 


We begin by simplifying (14.18). Note that 


> LD &a/p*,ylogp < 32 BP = SPE 


—1 
k>2 pk<a/y,p>y p2y k>2 P p2y p(p ) a 


where the last inequality follows by Chebyshev’s estimate (Theorem[2.4) and 
partial summation. In addition, Theorem[/4.2| with f(n) = 1p-(n)s, implies 
that ®(2,y) < x/logy for x > y > 2. Together with partial summation, 
this yields the estimate ) 7 ,<1, p-(n)>y log(z/n) < x/ logy. We combine this 
inequality with relation (14.18) to deduce that 


(log x) = > logn + O(a/ log y) 
nxax,P-(n)>y 


> ®(x/p,y) logp + O(2/ logy). 


Y<pKx 


Since ®(x/p,y) =1 for p > x/y, we conclude that 
(log2)®(z,y) = S> ®(e/p,y)logp +x + O(e/logy). 
y<psa/y 
Finally, note that 2/p < y“~! when p > y, so that the induction hypothesis 
can be applied to estimate ®(x/p, y). Consequently, 


r log(z/p)\ log p ; r 
log y Z B( log y ) p a Oul ey) 
y<psa/y 


(log x) ®(2,y) = 


Since B is continuous and t > 6(y*) = yip<yt logp is a step function with 
jumps of length log p whenever t = log p/ log y, we have 


og (x (e) bai u 
(14.23) S- (ep) ae f Blu). 


y<psa/y 


Next, we write 6(y') = y’(1+ 6(y*)), and integrate by parts to find thar] 


u-1 dO(y") _ u-1 ; u—1 
[ Bu-9 SP = Coen [Bu -dat+ Bu Hau) | 
u-1 
(14.24) + / (B'(u—t) + B(u — t) log y)d(y')dt. 


1Strictly speaking, we have to treat separately the integrals over [1, u’] and [u’, u— 1], where 
u’ = min{2,u—1}, because of the discontinuity of B’ at 2. Formula (14.24) remains valid though. 


152 14. Evolution of sums of multiplicative functions 


Since 6(y’) < 1/log(y') by the Prime Number Theorem, we conclude that 


log(2/p)) logp _ 1, ME 
xy B/ ) (1 ev) | B(u —t)dt + O,(1). 


lo 
y<p<a/y BY : 


Plugging this formula into (14.23) and using (14.22) proves that ®(z,y) = 
«B(u)/log y+O,(2/ log? x) for u € (N, N+1]. This completes the inductive 
step, and hence the proof of the theorem. 


The “dual” to y-rough numbers are y-smooth numbers which are inte- 
gers all of whose prime factors are < y. These numbers also play a central 
role in number theory, and we will enounter them again when we develop 
sieve methods in Part[4] Here, we use ideas from the theory of multiplicative 
functions to study their counting function 


U(z,y) = #{n <a: PT(n) <y}, 
where we recall that P*(n) denotes the largest prime factor of n with the 
convention that P*(1) = 1. 
Arguing heuristically and writing « = y" as before, we find that 


(14.25) (log x) U(x, y) & S- logn & ~~ W(2/p, y) log p 


n<a,P+(n)<y Py 
y 
. / U(x/t, y)at 


1 
U WU U 
(14.26) = (logy) | FOOD ay, 
u-1 ¥ 
This relation leads us to conjecture that there is a function p such that 
(14.27) U(z,y)~ap(u) (x=y", uz, y- ov). 


For consistency with the estimate U(x,y) = |x| = x + O(1) when y > gz, 
and with (14.26) when u > 1, we must have that 


U 


(14.28) puy=1 (05 we 1), up(u) = / p(v)dv (wu > 1). 

u-1 
Together, these relations define a unique differentiable function p : Rsp — R 
called the Dickman-de Bruijn function. Note that differentiating the second 
formula in (14.28) yields the delay differential equation 


(14.29) up'(u) = —p(u— 1). 


?The reason for this terminology comes from looking at. the sequence of divisors of an integer. 
If n is y-smooth, then every interval [z, zy], z < x/z, contains a divisor of n. On the other hand, 
if n is y-rough, then its divisors can have a much more singular distribution. 


Exercises 153 


Unlike Buchstab’s function that is of order of magnitude 1, the Dickman- 
de Bruijn function decays extremely rapidly. For instance, Exercise [14.10{c) 
states that p(u) = e°™ (wlogu). 
We conclude this chapter with a proof of our guess (14.27). 
Theorem 14.5. Fizu>0. For x = y", we have 
V(x, y) = rp(u) + Ou(x/log x). 


Proof. Unlike ([4.18), where x/p* < y“~! in its range of summation, in 
we only have the bound a/p* < x/2. Hence, we cannot use this rela- 
tion to induct on |u|. We could induct on |log x/log2| (see Exercise [14.8] 
where such an induction is performed in another problem), but we present 
instead a proof that uses a different recursive formula due to Buchstab. 


Note that if z > y and n is z-smooth but not y-smooth, then Pt(n) € 
(y, z]. Hence, we may uniquely write n = pm, where p € (y, z] and P*(m) < 
p (we simply take p = P*(n)). This leads us to Buchstab’s identity 
(14.30) W(x, z)— V(a,y) = S > V(a/p,p). 
Y<pSz 


When y > /z and z = a, we have x/p < p for all p > y, so that (14.30) 
becomes 


= £(1 — logu) + O(2/ log x) 
by Mertens’ and Chebyshev’s estimates (see Theorems [3.4(b) and re- 
spectively). This establishes the theorem for u € [1, 2]. 
For the general case, we apply (14.30) with z = \/z to find that 
U(a,y)=ap(2)- YS) U(a/p,p) + O(a/loga). 
Y<DSVJ a 
Since log(x/p)/logy < u—1 for p > y, we may now induct on |u| to 


establish the theorem, arguing similarly to the proof of Theorem We 
leave the details as an exercise. 


Exercises 


Exercise 14.1. Let f be as in Theorem [14.2] Show that 
1 -—1 
ye ea el (c > 1). 
n x oa. 


n>x 


154 14. Evolution of sums of multiplicative functions 


[Hint: If M denotes the right-hand side of the above inequality, show that 
a ee f(n)/n? « M-e-5- (14 j/log 2)™*t*-19} uniformly for 7 > 0.] 


Exercise 14.2. For fixed r € R and k € N, show that 


S> te(n)(y(n)/n)” =r e(logax)*-* (a B 2). 

nxn 
[Hint: To get the lower bound, use Theorem [7.4] and Hélder’s inequality.] 
Exercise 14.3. Let g be such that y?/y = g x (1/id), where id(n) =n. 
(a) Calculate g on all prime powers. Then use ‘an aaah show that 


vt 1 
S- lg(n)| < 2 < (x > 1). 
n>x ab2>a Vr 
(b) Prove that there is some constant c such that 
a = log +c+ O((logx)/V/z) (x > 2). 


ngx 
Exercise 14.4. Uniformly for m € N and x > 1, prove that 
#{n<a:(n,m)=1}<«Ka- II (1—1/p). 
plm, pga 


Exercise 14.5. Let f be a multiplicative function with 0 < f < Tp. 
>0 


(a) If g is such that - * g = Tp, then prove that j?(n)g(n) and 
2 2 
1? ( 20% w(n)g(r) 5 a (n)Te(n) ( 
> >) Sees Si). 
nxn nxn um N<u um 


(b) Prove that 


N<ax peu 


(c) If Va<y f(p) = cy/ logy for y € [z, 2], then prove that 


TF) ee exp {7 =H, 


N<u peu 


[Hint: Note that ark (n ) 2 ee f(m) atin f(p).] 
(d) Let b(n) be as in Exercise [13.4] Give a new proof of the estimate 
S 5 b(n) x a/ Vlog x (a > 2). 
nNKxr 
Exercise 14.6° Let f be a multiplicative function, and write f = Ty * rf. 
(a) For each ¢ > 0, use Hélder’s inequality to prove that 
ss Irp(n)| <e gi e( ze) ( o> Ir p(n n)|Lte/n yee 
nxn n<xx 


[Hint: Recall that ry is supported on square-full integers.] 


Exercises 155 


(b) Assume there are constants k > 0 and ) € [1,2) such that |f(p”)| < kA” for 
all prime powers p”. Prove that |rf(n)| < n° A°™., Deduce that ([4.4) holds 
for some 6 > 0. 

(c) Assume there is 6 > 1 such that \yo<y Diy>1 |f(p”)|°/p” « loglogx for all 
x > 3. 

(i) Ife < 6—1, prove that 7,5, 7 ¢\(p")'**/p” “eo |f(p)|'t*/p. (Hint: 
Show that f(p) < (ploglog(p + 1))!/° and use Exercise [L.2{e).] 

(ii) Show that |rf(n)|'T© < 7(n)° Vayen Lf (a)|'F 7 7)(b)'F*, and conclude 
that (14.4) holds for any 6 < (1 — 6)/(4 — 26). 

(d) Let f be as in part (c). Assume further that ([4.8) and ([£4.9) hold. Prove that 
f satisfies the conclusions of Theorems [14.2] and [14.3] 

(e) Let f be as in part (c). Assume further that (13-1) holds with Q = 2. Prove 
that f satisfies the conclusions of Theorem[13.2] [Hint: Use Hélder’s inequality 
to bound Dealt |f(n)|-] 

Exercise 14.77 Let « € C, c € [0,1), k, C > 0 and Q > 3. Consider a multiplica- 

tive function f such that tr, = f and |f| <7». Assume further that 


| S- Aner —Klogar| <logQ and S- IF) i < clogloga + C. 


pKu pKu 


for all x > Q, and let G(f) be as in Theorem [14.3] All implied constants below 
may depend on «,k,c and C, but they must be uniform in 2 and Q. 


(a) For x > Q, prove the following estimates: 
(i) Vepsa(f(p) — *)/p = O(log Q/log x). 
(ii) inca If (m)|/m K |G(f)|(log x) Pe for « > Q. 
(ii) Dace f(n)/n = (G(f)/T(4+1))-(log 2)": (140 (log Q)(log:n)°). [Hint 
Improve (14.14) to E(w) « |G(f)|(log Q)w*! for w > log Q.] 
(b) Prove that lim, B(u) = e77. 


Exercise 14.87% Let f : N > C be a multiplicative function such that |f| < T, for 
some k € N. Assume further that 


|d>f@)logp| < Mz/(logz)4 — («> 2) 


PKU 
for some M, A > 0. Show that there is M’ = M’(k, A, M) such that 
(14.31) | So f(n)| < M’x/(logx)4-***— (w > 2) 

nKx 


as follows. Firstly, reduce to the case when f = Tr. Then, prove that 
S- f(n)logn = S- As(a)f(b) + Oc,n,a.m(x/ (log x)4-*) 
n<xx ab<u,axa® 
for all « > 2 and each fixed € € (0,1). Finally, induct on the dyadic interval 
(2-1, 23] containing x to prove (14.31). 
Exercise 14.9% Let f : N > C be a multiplicative function satisfying (13.1) for 


some fixed A > 1 and « € C. Assume further that (13.2) holds. Estimate the 
partial sums of f. [Hint: Write f = 7, * g and use Exercise [I4.8]on g.| 


156 14. Evolution of sums of multiplicative functions 


Exercise 14.10. We extend the Dickman-de Bruijn function to all u by letting 
p(u) = 0 for u <0. Note then that up(u) =f", p(v)dv for all u. 


(a) Show that 0 <I(a) < 1 when 1 < o < 2, as well as that I'(c) is increasing for 
o > 2. [Hint: Use Theorem [14] to show that logT’ is convex on Rso.] 


(b) For u > 0, show that 0 < p(u) < 1/[(u+1). [Hint: Argue by contradiction 
and consider the smallest u that does not satisfy the inequalities 0 < p(u) < 
1/['(u + 1). 

Prove that p(u) = e~“los(ulog4)+O(™) for uy > 1. [Hint: Note that if ce < 
p(t) < Ce~*®® for se t <u, where a,b: Rs — Rs are increasing, then 
ef ye dt < uplu)< Cl, 6 ade 


Consider the Laplace een of the Dickman-de Bruijn function 


p(s) oh p(t)e” “dt for Re(s) > 0. 
0 
Show that p’(s) = p(s)(e~* —1)/s, = conclude that there is a constant c € C 
such that (8 ) =e*- (9), where f(s) = ff z71(1— e7*)dz. 

(e) Show that sp(s) + 1 when s > oo over positive real numbers. Use Exercise 
[5.4{c) to conclude that c = 4. 

Exercise 14.11. 

(a) For u > 2, show that uB’(u) = — f" , B’(v)dv and |B’(u)| < p(u). 

(b) Show that aie B(u) exists and equals 1+ {7° B’(u)du. 

(c) If B(s = fP° B(v)e~*"dv, then show that B'(s) = —e~*s~1(B(s) +1). Con- 
clude ta mee is a constant d € C such that B(s) +1 = ste? FS) with 
f(s) = fg 27' (1 —e~*)dz. In particular, lim,_,9+ sB(s) = e’. 

(d) Let s + oo in the formula B(s) +1 = s~te@+/() and use Exercise B-4{c) to 
conclude that d = —7. In particular, p(s)(B(s) + 1) = 1/s. 

(ce) Let L(s) = f°° B’(u)e~*"dv. Show that B(s) = (e~* + L(s))/s. Deduce that 
ee a =e ‘ 

(f) Show that the difference E(u) := a (uw) — e7 aoe signs infinitely often as 
u —> co. [Hint: Show that E(u) = — f° B’(v)dv = O(e~“). On the other 
hand, substituting B = E+e~7 in a a (u) =— f" , B’(v)dv, show 
that uE(u) = — f°, E(v)du for u > 2.] 


(c 


WN 


(d 


Na 


Chapter 15 


The distribution of 
multiplicative functions 


To obtain a better understanding of multiplicative functions it is not 
enough to know their asymptotic behavior. We also must study the distri- 
bution of their values. A probabilistic framework thus arises. 


Given a set AC {n < x}, we write 
Pn<«(A) = |Al/[x] 


for the probability that a randomly chosen integer n < x lies in A. The 
underlying o-algebra is naturally the power set of N<z. Given a random 
variable Z : N<z — C, we write 


icp Sey. 2G): 


|x] “= 


Within this framework, the value distribution of a real-valued function f is 
determined by the distribution function R 3 u > Pn<«(f(n) < u). 


As in the previous chapters, we shall focus on multiplicative functions 
f with f(p) ~ « on average. We then expect that f(n) is roughly 6°”) on 
average, so that the value distribution of f is reduced to that of w. 


The Kubilius model 


To study w, we note that w(n) = >/,), 1 = Lip lpn. We then define the key 
random variables Ba(n) := 1g, for d € N, so that 


(15.1) as” Bp 


pu 


157 


158 15. The distribution of multiplicative functions 


on the probability space N<,. The functions Bg are Bernoulli random vari- 
ables. Since there are exactly |x/d| multiples of d up to x, we have 
|z/d| x2/d+O(1) 


Prce(Ba(n) = 1) = = aay = Vd + O0/2). 


When d is fixed and x — oo, the above expression tends to 1/d. In addition, 


if pi, ..., Pm are distinct primes, then 
Pr<a(Bp,(m) = +--+ = By, (m) = 1) = Pagel Bop (™) = 1) 
(15.2) 1 
= ——— 4+ O(1/z). 
Pl-**Pm (1/2) 
Therefore, if we let x — oo, we find that 
m 
Prce(Bp:(m) = +++ = Bp, (2) = 1) ~ [] Paxe(Bp;(m) = 1). 
j=l 


We are thus led to the conclusion that the random variables B, for p prime 
are approximately independent from each other. 


The above analysis and relation imply that w is the sum of quasi- 
independent random variables. It is thus tempting to use tools from prob- 
ability theory to study its value distribution. There is an obvious problem 
with this approach: most of the standard probabilistic tools apply to truly 
independent random variables. 


To circumvent this problem, we introduce new Bernoulli random vari- 
ables K, (living in some ambient probability space) that are completely 
independent from each other, and for which we have the exact equality 
P(K, = 1) = 1/p. The random variables K, are idealized models of Bp. 
Collectively, they form the Kubilius model of the integers. 


Let us consider now the sum S = }/,<, Kp, whose distribution models 
the function w on the space N<,. Since S is the sum of independent ran- 
dom variables, we may apply to it well-established probabilistic tools. We 
can then hope to transfer the results on S to the deterministic setting of 
w. However, it should be noted that the Kubilius model has its limits, as 
Remark reveals. We will return to this important point in Chapter 
and discuss it in more detail. 

We conclude our introductory discussion of the Kubilius model showing 
that it is possible to construct the random variables K, in a very natural 
and concrete way. We take as our probability space the set of y-smooth 
integers S(y), which we employ with the probability measure 


Psq(A) = ]] (1 ~ -) s - 


PSY neA 


A Central Limit Theorem for w 159 


Notice that the random variables (By)p<y we saw before become completely 
independent of one another in this new probability space: if py <---> < pe < 
y and a1,...,a~% € Zsi, then 


1 1 1 

Esq) (Bet --- Bet] = T] (1- =) se 1 en 

psy nest) pan 
Pl y-sPh|n 


A Central Limit Theorem for w 


We now use the Kubilius model to study the value distribution of w. Simi- 
larly to (15.1), we have that w = ee By on the space S(y), that is to say, 
w is a sum of independent random variables. Its mean value is 


1 
Es(y) lw] = S- — = loglogz + O(1) 


PSY 


by Mertens’s second estimate (Theorem B.4{b)). Similarly, its variance 
equals 


Voqilel =X VauilBol = Xo (F - ) =tostoge + 001), 
PLY PSY 

where we used the independence of the variables B,. Since the B,’s are uni- 

formly bounded and the variance of w tends to infinity, Lindeberg’s Central 

Limit Theorem (see Theorem 27.2 in [7] or Theorem 2.1.5 in [168]) implies 

that, for any fixed a < 6, we have 


. w(n) — log log y 1 Bar 
(15.3) Jim Paesw(a< Jealary < )= a. a dt: 
The random variables B, are approximately independent with respect to 
the probability measure Pn<z; too. Thus, we might expect a similar result 
to hold for this measure. This was indeed proved by Erdés and Kac in 1940. 


Theorem 15.1 (Erdés-Kac). For each fired a < 8, we have that 


w(n) — log log x 1 [ _ 12/9 
reg Vlog log x V27 Ja 


We will use the so-called method of moments to prove Theorem [15.1 
The key result is Theorem [15.2| whose proof is given in Appendix We 
write V(0, 1) for the standard normal distribution. 


Theorem 15.2. Let (X5)7a be a sequence of real-valued random variables. 
(a) Assume that 


(15.4) lim E[X*] = E|V(0,1)*] for all k € Zsy. 


j-co 


Then (Xj)721 converges in distribution to N(0, 1). 


160 15. The distribution of multiplicative functions 


(b) Conversely, assume that (Xj)%21 converges in distribution to N(0, 1). 
Tf, in addition, sup;s1 ELX?*) <oo for allk EN, then (15.4) holds. 


We already know that (w—log log y)/,/log log y converges in distribution 
to N(0,1) with respect to the measure Psy) when y — oo. We now check 
that the second hypothesis of Theorem b) holds for it. 

Lemma 15.3. Uniformly for y > 2 and k € Zs1, we have 


w(n) — log log y |* 


“ sw| Vlog log y 


Proof. Note that |t|* < klel"l < kl(e + e~*) for any t € R. It thus suffices 
to prove that 


|«<#. 


En Su) [eo (n)/Vlog log y} < eeviog log y 
for a = +1. The independence of the Bernoulli random variables B, implies 
that 
i z-—1 gi) 
LS y) [2”] = II Es(y) (2? = II (1 a - ) < exp { ye 5 \ 
py py psy 


for all z > 0. We take z = e®/V°8 logy for which we have 
z—1=e%/Viesbey _ | = q/,/log log y + O(1/ log log y). 


Hence, Mertens’ second estimate (Theorem[3.4{b)) completes the proof. 


We are now ready to prove the Erdés-Kac theorem. 


Proof of Theorem We follow an idea due to Billingsley [7| Section 
30]. Throughout, we let A, = log log x. 

By Theorem[I5.2\a), it suffices to prove that the moments E,<,|[(w(n) — 
a)* ve!) converge to E[N(0,1)*] when 2 + oo. However, we already 
know that Enesiy)[(w(n) — MO | tends to E[N(0,1)*] when y — 00 by 
Theorem [15.2(b), which is applicable in view of relation and Lemma 
Consequently, it suffices to show that 


(15.5) Ence| (“= **) =f, sa |)’ peas) 


for each fixed k € Zs1, where y = y(x) is an appropriate function of x going 
to infinity. 


An integer n < x has < logaz/logy prime factors > y (see Exercise 
2.He)). So, if we let y = x!/lsloglos® and w(n;y) := #{ pln: p < y}, then 


w(n) = w(n; y) + O(log log log x) (ee 


Dissecting sums of multiplicative functions 161 


Since we also have that |Ay — Az| = Or+co(W Az), relation (15.5) is reduced 
to showing that 


(15.6) En<ex [(w(n; y) — Ax)*] = Ey S(y) [(w(n) — Ae) | + Hessel 
Note that 
En<ex [(w(n; y) — Az)*| _ Encs(y) [(w(n; y) — Az)" 


k 
ae ee eee 
_> (5)¢ Az) I( er ( sy)" ] ” sty) ( i). 


Hence, it suffices to show that, for each fixed 7 € ZN [0,k], we have 


En<a[w(n; y)'] — EneS(y) [o(n)’] = ee a 


We begin by noticing that w(-; y)’ = ooren BLY =>) 
Taking expectations implies that 


ecw l= >. Pres {8,,(n) = 1). 
Ply-+-sP5<Y 


We then apply a variant of (15.2) to conclude that 


PA y---sP5 SY Bp 


Se [2/[p1,---Pyll 


PlysPGSY 2] 


1 ; 
= omy TOW). 


En<alw(n; y)’] = 


Pl y--sPGRY 
where [p1,..., pj] denotes the least common multiple of the primes py, ..., 
pj. A similar calculation implies that 
En esqnlw(n)!] = st 
[p1, sree Dy 


Ply sPjSY 


whence Enca{w(n; y)4] — Enes(y [o(n)4] « m(y)4/a = On s00(2 */”) by the 
choice of y. This completes the proof of Theorem [15.1 


Dissecting sums of multiplicative functions 


Now that we have a good understanding of the distribution of w, we use it 
to analyze the finer structure of }7,,<, «(", Our goal is to determine which 
values of w give the dominant contribution to this sum. More concretely, we 
want to identify those sets Z(x) C Ryo with the property 


(15.7) Ss Kar) > Kw) (a - 00). 


nX<ax,w(n)ET (x) n<x 


162 15. The distribution of multiplicative functions 


Of course, we could take Z(x) = Rso, but this is not so insightful. We want 
to find Z() that is as small as possible, while still satisfying (15.7). 

Since w has mean value log log x and standard deviation ./log log x over 
the probability space N<z a natural guess is the set 


T(x) = flog log x — €(x)./log log x, log log x + E(x) /log log a], 
where €() is a function tending to infinity slowly. However, note that 
Ku) = (log a)oe*to() on the set A(z) = {n < x: w(n) € Ti(z)}, 
whence 


(15.8) S> Koln) _ (log poe rto(l) . |Ay(x)| = a(log x) srt) 

n€Aji (x) 
as x — oo. If k #1, we find that log« < «—1. Hence, Theorem[13.2]implies 
that the contribution of integers n € A(x) to So, -,, Kh?” is negligible. 


Nx 
From the above discussion, we conclude that the sum )7,,<, Ke) with 
k& # 1 is dominated by integers n < x with “atypical” values of w(n) with re- 
spect to the measure P,<,. As a matter of fact, for the purpose of identifying 
T(x), it is more natural to switch to the weighted probability measure 


_ Den Koln) 

7 Dems ge) 

We write Ef <,.[Z] for the expectation of the random variable Z : Nn<z + R 
with respect to this measure. Finding a set satisfying (15.7) then amounts 


(15.9) n<a(A) : 


to understanding the distribution of w with respect to the measure P?<,, 
It turns out that w is approximately Gaussian with respect to Ph<, as 


well, but with expectation and variance ~ « loglog x. We may then take 


(15.10) Z,(a) = [kK log log x — €(x)V/log log z, k log log x + €(x) Vlog log a], 


where €(xz) — oo. The details of this argument are outlined in Exercise[15.2 


Exercises 


Exercise 15.1. Deduce that Theorem holds with © in place of w too. [Hint: 
Use Exercise B.9]] 


Exercise 15.2. Fix « > 0, and let P*., be defined by (15.9). Define also 


n<x 
P§(y)(A) = Tpcy(t + 6/( — 1))7* Dea 82") /. 
(a) For d € S(y), prove that PR _s(,)(Ba(n) = 1) = mea TT ya( acme 
(b) If w(d) < k for some fixed k € N and x > d?, then prove that 
Pr<a(Ba(r) = 1) = Pres(y (Bar) = 1) + Or,«(1/(dlog x). 
[Hint: Use Theorem [I3.2]on the function fa(m) := K?(%™-#® | 


Exercises 163 


(c) For each fixed a < 8, prove that 


. : w(n) — K log log x 1 de _ P72 
eso TSE K log log x P Van Ja © 


In particular, (15.7) holds for the set Z,,(a) defined by (15.10). 


Exercise 15.3% Let f : N — Ryo be multiplicative. Determine necessary condi- 
tions under which w satisfies an analogue of the Erd6s-Kac theorem with respect 


to the measure PI -, (A) = Vnea f(n)/ ee f(n). 


Exercise 15.4* (a) (Landau [126]) For each fixed k € N, prove that 
1 (log log x)*~1 
log x (k — 1)! 
(b) (Hardy-Ramanujan [95]) Show there are constants A and B such that 

A (loglog2 + B)*-1 
Pr x =k < . ’ 
<x(w(n) ) log x (k — 1)! 


Pn<a(w(n) =k) 


(a — ov). 


uniformly for > 2 and k EN. 
[Hint: When w(n) = k+1, prove that there are at least k ways to writen = p*m 
with p < n'/(2+1) and w(m) =k. Then, induct on k.] 


Exercise 15.5* Let q ¢ N and P, denote the uniform counting measure on the set 
of Dirichlet characters mod q (i.e., Pg(¥’) := |A4|/y(q) for any set XY of Dirichlet 
characters mod q). 


Let Z(2;x) = 1(2)-/? ca X(p). If z,q > 00 at a rate such that 2 = g°, 
and x is sampled with respect to the measure P,, then prove that the random vari- 
ables y > V2 Re(Z(x; x)) and y > 2 Im(Z(2; y)) both converge in distribution 
to the standard normal distribution |}| [Hint: Model y(p) by e*8, where 02, 3, 5, 

. is a sequence of independent random variables that are uniformly distributed 
on (0, 27].] 


In fact, /2-(Re(Z(2; x)), Im(Z(a; x))) converges to a 2-dimensional Gaussian: when q —> 00 
and «= q?\), the quantity P,(Re(Z(a; x)) < u/V2, Im(Z(a;x)) <v/V2) tends to P(N(0,1) <u) 
-P(N(0,1) < v). Proving this result requires a 2-dimensional analogue of Theorem and is an 
excellent exercise on the method of moments. 


Chapter 16 


Large deviations 


Let X be a real-valued random variable with expectation 4 = E[X]. In 
many cases, most of the mass of X is concentrated around ps. To measure this 
concentration, we seek to estimate the rate of decay to 0 of the probabilities 
P(X >w+u) and P(X < p—u), that is to say, how “heavy” are the right 
and left tails of the distribution of X. 

If X is exponential, ie. it has density liso -e~*, then w = 1, P(X > 
1+u)=e“? and P(X <1-—u) =0 for u > 1. On the other hand, the 
tails of (0,1) are of size & exp(—u?/2), which is much smaller than e~“~. 
Hence, a Gaussian is more concentrated than an exponential distribution. 


As another example, consider the distribution of w with respect to the 
measure P*—.. defined in (15.9). Understanding wherein lies the mass of this 
distribution is essentially equivalent to finding a set Z(a) satisfying (15.7). 

The study of tails of distributions is the subject matter of Cramér’s 
theory of large deviations ({168) Section 1.3], [7, Chapter 9]). For simplicity, 
let us assume that X is normalized so that 4 = 0. We focus on understanding 
the frequency of occurrence of the event {X > u}. Since {X < —u} = 
{—X > u}, this treats left tails as well upon replacing X by —X. 


The main tool in the study of {X > u} is the Laplace transform of X, 


£x(s) = Ele], 


which is typically defined in some vertical strip c, < Re(s) < co. The 
simplest way of using £x(s) to estimate P(X > wu) is via Chernoff’s in- 
equality (which is a simple consequence of Markov’s inequality): for any 
a € I := (cy, c2) N (0, +00), we have 


(16.1) P(X > u) =P(e7*—4 > 1) < Ele? *-] = e--- Lx(o). 


164 


Dissecting sums of multiplicative functions: Encore 165 


If the function e~°"-£x(c) is minimized at a = a(u) € J, then its derivative 
must vanish at 0 = a, whence 


(16.2) Ly (a) = ul x(a). 
We assume for simplicity that the equation L4.(o) = u£Lx(c) has a unique 


solution in J, so that a is determined by (16.2). 


The above method can often yield lower bounds on P(X > u) as well: 
in many cases it turns out that for the optimal choice of o = a the integral 
Lx(c) = f e’*dP is dominated by values of X ~ u, that is to say, 


Lx(a) & | eX dP = eP(X & u). 
Xu 


Hence, we also have the rough lower bound P(X > u) & e °“Lx(a). 


A more sophisticated approach is to use the inverse Laplace transform: 
for c € I, Perron’s inversion formula implies that 


s(X-—u 
(16.3) PX Su) = | : | ef “as = 2 Lx(s)e er? 
2nt Jie) 8 27% Jie) 8 

provided, of course, that we can justify an application of Fubini’s theorem. 
The choice of c is crucial here, and we take c = a. This is because if we write 
Lx(s)e = ef), then f’(a) = 0, so that the integrand has a stationary 
point at s = a. Under some mild conditions, the integral in is then 
dominated by values of s + a. We may thus obtain an asymptotic evaluation 
for P(X > u) using Taylor’s theorem for the integrand, much like we did in 
the proof of formula (1.20). 


We present below three applications of this circle of ideas. 


Dissecting sums of multiplicative functions: Encore 


First, we use the method of large deviations to obtain information on the 
multiplicative structure of a “typical” integer with respect to the probability 
measure P<... Results of this kind fall under the subfield of number theory 
called Anatomy of Integers. To state our theorem, we let pi(n) < po(n) < 

* < Pu(n)(n) denote the sequence of the distinct prime factors of n in 


increasing order. 


Theorem 16.1. Fiz k,¢ > 0 and a function € : Rso — Rso tending to 
infinity. Let A(a) be the set of integers n <x with 


J j 
< ‘ < << 
meme log log p;(n) < a (€(2) < j < w(n)) 


and w(n)/loglogxz € [k —e,K +e]. Then PK, (A(x)) = 1 — 0¢-400(1). 


NKx 


166 16. Large deviations 


Proof. All implied constants might depend on «. Recall the notation 
w(n;y) = #{pln:p<y} = >> Bp(n) 
Py 
Since w(n;p;(n)) = 7, the theorem will follow if we can show that, for each 


fixed ¢ € (0,4) and each fixed function = : Ryo > Rso with limg0 v(x) = 
oo, the event 


(16.4) (K—e)loglogy <w(niy) <(K+e)loglogy = (W() <y <a) 
occurs with probability 1 — o7-,..(1) with respect to the measure Pf... 


Since Ef<,|w(n;y)| ~ Kloglogy, this amounts to showing a simultaneous 
concentration-of-measure inequality for all of the random variables w(-; y). 


To accomplish the above task, we use Chernoff’s inequality (16.1) (also 
called Rankin’s trick in this context): for all u > « and all o > 0, we have 


(16.5) Prep (w(n;y) > uloglogy) < (logy) - BE fer rs), 
In addition, we have 

RX ow(nsy)] — (n) eruln; K(e7 —1) 
(16.6) Enc, [e?"'] x ao z(ogx)*=! a i”) <, (logy) ; 


where the first estimate follows from Theorem 13.2} and the second one 
from Theorem [4.3] (applied to the function f(n) = 6¢(™e7 (9), for which 
f(p) = &+ K(e” — 1)1p<y) and from Mertens’ second estimate (Theorem 
[3.4(b)). We insert (16.6) into (16.5) and optimize the resulting upper bound 
by taking o = log(u/k). This yields the inequality 
Pr<x(w(n;y) > wloglogy) « (logy) *2™/") 
uniformly for y € [2,2] and kt < u < 100K, where Q(t) := tlogt —t4+ 1. 
A very similar argument also proves that 
Pr <,(w(nsy) < wloglogy) « (logy) "@"/") (2<y<2, 0<uK<r). 
We thus arrive at the concentration-of-measure inequality 
(16.7) Pee (oot (n3y) — K log log y| > é log log y) <,, (logy) alee) 
where 6(K,€) = Kmin{Q(1+4+ ¢/K),Q(1 — e/K)} > 0. This establishes the 
theorem for a fixed value of y. 


We pass to a result for all values of y using a simple trick: we fix the check 
points yj = min{z(x)*, xz}, 7 > 0, and let J be such that yj_1 < x = yy. 
Then, the union bound and relation (16.7) with ¢/2 in place of ¢ imply that 

J J 


Pr co( U {|w( ny) — K log log y;| >= 5 log log uj}) ae S “(log yj) eel?) 
j=0 j=0 


Kx,e (log (x))~ mers), 


The saddle-point method: Encore 167 


Now, let n < x be such that |w(n; y;) —# log log y;| < 0.5e log log y; whenever 
0< j< J. We know that this holds with probability 1 — 0,,..(1). In 
addition, if y € [w(a), a], then there is 7 € {1,...,J} such that yj-1 <y< 
yj. Hence 

w(n;y) > w(n; yj) > (K — €/2) log log y;-1 > (K — €) log log y, 
provided that x is large enough (so that y > w(x) is also large enough). 


Starting with the inequality w(n;y) < w(n;y;) we may also prove that 
w(n;y) < (kK + ¢€)loglogy. This completes the proof of the theorem. 


The saddle-point method: Encore 


In Theorem [15.1] we let the value of w vary in certain wide intervals. Now, 
we study the proportion of integers n < x for which w(n) takes a given 
value k. This is a rare event, so we will study it using the method of large 
deviations and the theory of the inverse Laplace transform. 
Theorem 16.2. Fix C >0. Forx >1 andk € ZN (1, Clog log 2], we have 
G(a) — (loglog a)*~! ; 
P =h)\= . 1+ Oc(k/(log! ; 


where a = (k —1)/loglogx and 
G(z) =I1(: ! —) (1 5) “MG+4F)0-5)- 


Proof. We may assume that & > 2, with the case k = 1 following from the 
Prime Number Theorem. 


Since w takes values in Zso, it is easy to invert the Laplace transform 
here: using Cauchy’s residue theorem, we readily find that 


1 z°\™)dz 1 utah 2 
Pr<ao(w(n) = k) = Ence Es b, yktl = 271 f En<alz ( Nea 


for any r > 0. We estimate the integrand using Theorem [13.2} we have 


(iss) En<al2] = H(z)2(log x) + O; (log «)R)-?) 

uniformly for |z| =r, where H(z) := G(z)/I'(z +1). As a consequence, 
1 A ] z—1 l z—-2 

(16.9) Prce(w(n) =k) = — (2)(log.)*7! + Or((log.#)?7?) 4, 


i 271 |z|=r zk 
The function H(z) is entire and has bounded derivatives. Hence, it will not 


affect the order of magnitude of the integral. If it were not present, nor did 
we have an error term, we could use Cauchy’s theorem to find that 


—¢$ (log a)" a il (log log ar)" 
|z|=r 


16.1 . 
va Qni es log x (k — 1)! 


168 16. Large deviations 


The main idea is to choose r in a way that the mass of the integrals in 
and in is concentrated around the point z = 1, because then we can 
replace H(z) by H(r) in at the cost of a small error, and then apply 
(16.10). 

To carry out the above strategy, we use the saddle-point method: we 
pick r = a = (k—1)/ loglogz, so that, if we write (log x)*—1/z*-1 = e&@)/z, 
then ¢’(a) = 0. (We have left a z in the denominator because dz/z is the 
natural invariant measure on the circle |z| = r.) For this choice of r, it can 
be seen using quadratic approximation that most of the mass of the integral 
in is on the arc |z —a| < a/Vk. 


Now, we write Pn<z(w(n) = k) = 1, + Ig, where 


fe i g H(a)(log x)? 5 _ H(a) (loglog x)*7! 
|z|=a 


(Oni gh ~ logx (k — 1)! 


by Cauchy’s theorem, and 


a (H(z) — H(a) + Oc(1/log 2)) (log x)? 
ta On f. zh - 
Lf (e=a)8"(0) + ole al? + 1/log2)) (loge)? 
k 


- Ini |z|=a z 

by Taylor’s theorem. The integral of (z — a)(logx)*~1/z* vanishes by 
Cauchy’s theorem and the choice of a. We bound the remaining part of 
Iz by making the change of variables z = ae’® with @ € [—7,7]. In conclu- 
sion, we have arrived at the estimate 


T (] acosd—1/,2 10 __ 4/2 1/] 
ic <c | (log x) we [+ 1/log 2) ag 


For all 6 € [—7, 7], we have |e’? — 1| = |6|, by Taylor’s theorem. In addition, 


(log x)%°°s? Z eh cosd < ek(1—-6?/2+04 /24) = ek(1-c6) 


where c = (1 — 1?/12)/2 © 0.0887. Therefore, 


k wT 
Ino =i | (a76? + 1/log ax)e~* ag 


ak-lloga J_, 
k 
c 21,-3/2 —1/2 
<o a 5 : (a k73/2 4 -V / log x) 
z, a 1 (loglogx)*+ 


k logx  (k—1)! 


by Stirling’s formula and the choice of a. Finally, we note that H(a) <c 1 
for a € [0, C] to complete the proof. 


Smooth numbers 169 


Smooth numbers 


The theory of large deviations can be used to obtain strong quantitative 
bounds for the number of y-smooth numbers < x. Notice that 


A(d l 
Ene S(y) [log n] = Enes(y) [oa = S> A(d) _ S- at Siigety 


d\n deS(y) psy 


as y > oo. Thus, ifn is y-smooth and > x = y“, then log n is approximately 
u times larger than its expected size. This offers a heuristic explanation why 
the Dickman-de Bruijn function decays so fast. 

Optimizing Chernoff’s inequality (16.1) (often referred to as Rankin’s 
trick in this context) leads us to the following general theorem. 


Theorem 16.3. Let f be a multiplicative function such that 0 < f < tT, for 
some k € Zy1. Letx > y > 3 andu = logz/logy. Ify > (log «)2+é for 
some 6 > 0, then 


f(n) eOk,5(u) (p) 
ye n : (wlog(2u))" | exp {+ D } 
neS(y),n>x PSY 


Proof. All implied constants might depend on 6 and k. We may assume 
that u is large enough and that 6 < 1/2. Let € € [1/logy, 1/2 — 6/5] to be 
chosen later. We have 


f(n n/a)* f(n f(p™ 
y Mey COO e+ A). 


nES(y),n>x neS(y) p<y m=1 P 


To each factor, we apply the inequality 1+¢ < e’. Our assumptions that 


O< f <7 and that ¢ < 1/2 —4/5 imply that )7, nso f(p We phe) ee 1 
Consequently, 


(16.11) S- fa) < exp{ - elogx + +e 
nES(y),n>x pxy P 

Next, we write « = w/logy, so that 1 < w < (1/2 — 6/5) logy. For the 

primes p < y!/”, we note that p’/!8¥ = 1+ O(wlog p/ - y). Thus 


= f(p 
> pi oF = > 14001 oe ee 14000 


p<yl/w p<yt/w pry 


by Mertens’s estimates (Theorem[3.4). For the bigger primes, we use Cheby- 
shev’s estimate (Theorem [2.4) and partial summation to find that 


é ew y  ¢w/logy 
» ae . 2 pi—w/ logy Sar log y 7 l.. tlogt " 
yt!” <psy yl!” <p<y . 


170 16. Large deviations 


Making the change of variables t”/!°8¥ = e?, we deduce that the right-hand 
side is < e”/w. 


Putting the above estimates together, we conclude that 


f(n) w f(p) 
> AO) exp { - ww + Of¢ ju) + Ah. 


neES(y),n>x PSy 


We choose w > 1 implicitly via the formula e’~!/w = u. Taking logarithms, 
we find that w — 1 -logw = logu. In particular, w = log u, whence log w = 
loglogu + O(1). We appeal again to the identity w — 1 — logw = logu to 
find that w = log(ulog u) + O(1). If we can show that w < (1/2 — 6/5) logy 
for this choice of w, the theorem will follow. This inequality is true if 


e(1/2—4/5) log y—1 


>u 
(1/2 — 6/5) logy 


= y'/?-8/5 > e(1/2 — 5/5) logr 
= yc: (log re gma 


where c = (e(1/2—6/5))?/(1-?9/5), The last inequality is indeed satisfied by 
our assumption that y > (log)?*°, thus concluding the proof. 


Using the method of proof of Theorem [14.2] we can deduce from The- 
orem [16.3] an analogous result for the arithmetic mean of a multiplicative 
function over y-smooth integers. Taking f = 1 we find a uniform bound for 
the function U(z, y) that we encountered in Chapter [[4] 


Theorem 16.4. Assume the set-up of Theorem [16.31 Then 


eOk, 5 (u) 


» f(r) < (wlog(2u))¥ er { ie oa w=}. 


neES(y)N[1,2] Py 


Proof. Without loss of generality, 6 < 1/2. In addition, in virtue of Theo- 
rem[14.2| we may assume that u is large. Lastly, the condition y > (log «)?+° 
allows us to also assume that x and y are large. As in the proof of Theorem 
16.3] we consider a parameter ¢ € {1/log y, 1/2 — 6/5]. 

We use a variation of the proof of Theorem [14.2] that links mean val- 
ues of multiplicative functions to logarithmic mean values. Mimicking the 
argument there, we start by writing 


(16.12) (logz) So f(n)=$1+S2, 
nES(y)N[1,2] 
where 


Si = > f(n) log(az/n) and 53 = > f(n) logn. 


neS(y)N[,2] neS(y)N[1,2] 


Smooth numbers 171 


For $1, we note that log(x/n) < (#/n)!~£/(1 — e) < 2(x/n)!~* and thus 


(16.13) Spo FS iy 
n 


neS(y) 


Next, we bound 52. We do not have control on Ay because we have not 
assumed that tT, = f. Instead, we note that 


f(n)logn = f(n) S~ log(p*) = S> f(m)f(p") log(p’), 
po ||n pem=n 
ptm 
whence 
SOS SS] af) f(m)logp= SOS” flm)logp S> af(p") 
meS(y), psy, az meS(y), p<y a>1 
pomea pmax p°mcr 


For each fixed m and p, we have 1 < a < log(x/m)/logp and f(p") < 
Th (pt) <p ak! < (log(a/m)/log p)*-!. Hence 


lo 
52 <k S> f(m) », — 
meS(y)M[1,x/2] 2<p<min{z/m,y} 
For any z > 2, Chebyshev’s estimate and partial summation imply that 


5 Mostar (‘egeeemny 


an (log p)* log z 


m) k+1 


l-é,€ 


If z = min{z/m,y}, then log(#/m)/logz < u and z < (a/m)**y*. We 
thus conclude that 


k+1, ¢,,1-e f(m) 
Sy <p ub Sy AM) 
meS(y) 


Together with (16.12) and (16.13), this implies that 
Un Yye x f(n 
(16.14) Ss) f(n) <<, —— _ 
nES(y)N[1,2] 


We then follow the proof of Theorem to bound the right side of (16.14) 
(taking ¢ = w/logy with e’-'/w =u). This completes the proof. 


It is also possible to use the theory of large deviations to obtain an 
asymptotic estimate for U(x,y). This is done using an analogue of (16.3) 
obtained by Theorem [7.2] which implies that 


1 1\-!a% 


py 


172 16. Large deviations 


We choose a satisfying the analogue of (16.3), which is the equation 


S ee = loga. 


PSY 


This argument was carried out by Hildebrand and Tenenbaum [103]. 


Theorem 16.5. For x > y > 2 with u = logz/logy and a as above, we 
have 


re Z 1—1/p% —1 a 
(x,y) = veal = (14 o(- | “EH)), 
where L=—2| | inc, (log p)/(p? — 1) = Vycy P* (log p)?/(p® — 1). 


More details on the subject of smooth numbers can be found in Chapter 
IIL.5 of Tenenbaum’s book [172], as well as in the survey article of 
Hildebrand and Tenenbaum. 


Exercises 


Exercise 16.1. Estimate the sum }°),~-,<, 1/w(n) in two ways: (a) use Theorem 
{16.2} (b) use concentration-of-measure inequalities obtained by the method of large 
deviations. 


Exercise 16.2. Fix C > 0 and let Ay = [],<,(1—1/p)*. For y > landk eZ 
with 0<k < CAy, prove that 

Pres(y)(w(n) = k) = G(k/Ay)e AER“ (1 + Oc (k/22)), 
where G(z) is defined in Theorem [16.2] 
Exercise 16.3. Let & > 0, and let f be a multiplicative function such that 0 < 
f < T.. 
(a) Use Theorem to show that there is some constant C = C(k) such that 

S- FO) 2, exp { yA) for z > y° > 1. 
n Pp 
n€S(y)N[1,2] pXy 


(b) Give a new proof of the lower bound from Exercise b) that states that 


Part 4 


Sieve methods 


Chapter 17 


Twin primes 


The theory of Dirichlet L-functions allows us to make significant progress 
on our understanding of prime numbers. However, there are numerous im- 
portant questions about primes that seem to be intractable using Dirichlet 
series because they are fundamentally of non-multiplicative character. To 
study them, we go back to the basics and employ the most fundamental way 
of detecting primes: the sieve of Eratosthenes-Legendre. We illustrate some 
of the main ideas by discussing the famous twin prime conjecture. 


Twin primes arise naturally when studying the spacing distribution of 
primes. The first ten primes are 2, 3, 5, 7, 11, 13, 17, 19, 23 and 29, and the 
spacings between them are 1, 2, 2, 4, 2, 4, 2, 4, 6. The number 1 will never 
appear again as a spacing because all primes p > 3 are odd, and thus p+ 1 
cannot be prime because it is even and > 2. By the same argument, no odd 
number > 1 will ever appear as a spacing between two consecutive primes. 
On the other hand, there is no obvious reason why the even numbers should 
not keep reoccurring. Already the number 2 appears four times in the above 
list, and the number 4 appears three times. The number 6 appears once, 
but this is only because we have not looked far enough yet for a second 
appearance. Indeed, 31 and 37 are both primes and they differ by 6. 


In 1849, de Polignac conjectured that any even number should appear 
infinitely many times as the gap between two consecutive primes. The pairs 
of primes that differ by 2 (and which are necessarily consecutive) are called 
twin primes. To study them, we define their counting function 


To(x) = #{n < x:n,n+2 are both primes }. 


The twin prime conjecture, which is a special case of Polignac’s conjecture, 
states that 72(x) > oo as Z > 00. 


174 


Sieving for twin primes 175 


Counting twin primes is a so-called additive problem: we are asking 
for solutions of the equation gq — p = 2, where both p and gq are prime 
numbers. Hence, the Dirichlet series approach, which was crucially based 
on the Euler product representation of the Dirichlet L-functions, is of limited 
use here. To make progress towards the twin prime conjecture, we revisit 
the combinatorial ideas of Chapter 2] 


Sieving for twin primes 


We begin by rewriting 72(x) using the sieve of Eratosthenes-Legendre: if for 
some n € (v/a + 2,2] the product n(n + 2) has no prime factors < Vx + 2 
none of n and n+ 2 have prime divisors smaller than their square-root, snd 
so they must both be prime. The converse is also true. Hence, 


ro(2) = #{n <a: (n(n +2), P(VEF2)) = 1} + O(V2) 
with P(y) = |[,<,p as usual. We would like to use the inclusion-exclusion 
principle to estimate 72(x) but the most direct application of this argument 


produces trivial bounds (see the discussion following Theorem[2.1). Limiting 
our goal to an upper bound for 72(x), we use Legendre’s idea to find that 


(17.1) 72(x) < m2(z,y) + Oly), 
where 
m2(x,y) =#{n <x: (n(n+ 2), Ply)) =1} 


and y is a parameter < /2+2 that we are free to choose. We then apply 
inclusion-exclusion as in (2.5) to find that 


(17.2) ma(x,y) = So wld) No(s), 
d|P(y) 
where 
No(x2;d) = #{n < x: d|n(n+2)}. 
To estimate the right-hand side of (17.2), we note that each interval of length 
d contains exactly v2(d) numbers n such that d|n(n + 2), where 
vo(d) := #{n € Z/dZ: n(n + 2) = 0 (mod 4d) }. 

Adapting the argument leading to (2.3), we deduce the formula 

v2(d) 
aa) 


(17-3) No(a;d) = + O(12(d)). 


The function v2 is multiplicative by the Chinese Remainder Theorem and 
on primes it equals 
_ fi ifp=a2, 
v2(P) =) 5 if p> 2. 


In particular, vo(d) < 2° = r(d) for square-free integers d. 


176 17. Twin primes 


The above discussion leads us to the asymptotic formula 


(17.4) d\P(y) : d\P(y) 
a a: m(y) 
5 II (1-5) +06). 
3<p<y 


As in the proof of (2.9), we are forced to choose y to be a multiple of log x. 
We thus arrive at the estimate 
x 


19(x) < (loglog x)?" 


Note, however, that this bound is worse than the trivial inequality m2(x) < 
w(x) < x/loga. 

It seems that we have quickly reached an impasse. This remained the 
state of affairs for more than a hundred years following Legendre’s work 
on the sieve of Eratosthenes. The great breakthrough in sieve theory that 
turned it from an interesting observation to an indispensable part of modern 
number theory was undertaken by Viggo Brun in 1915. His starting point 
was the realization that it is possible to replace the exact formula (17.4) by 
upper and lower bounds that involve a lot fewer summands, thus making 
the remainder terms much more manageable. 


Brun’s first improvement of the sieve of Eratosthenes-Legendre arises 
from a better understanding of the mechanics of the inclusion-exclusion 
principle. Recall that z2(xz,y) counts the number of n < «x that are in 
the complement of the union of the sets No(x;p) = {n < x: p|n(n + 2) } 
with p < y. By the union bound, we have x — mo(z,y) < Ti(z,y), where 
Ti(z,y) = Vip<y Na(x;p). The expression x — Ti(x,y) then serves as a 
first approximation to 72(x,y) that always underestimates its size, because 
there are numbers lying in the intersection of two of the sets N(x; >p). 
We then add to x — Ti(x,y) the quantity To(z,y) = ee N(a; pipe). 
This leads to an overestimation of 72(x,y), the reason being that there 
are numbers lying in the intersection of three of the sets N2(x;p). At the 
next step we thus subtract from the expression x — T;(x,y) + To(z,y) the 
quantity 73(t,y) = don, <po<pa<y N2(% pipeps). The resulting expression 
x — T(x, y) + To(x, y) — T3(x, y) underestimates 72(z, y). 


Continuing in the above fashion, we arrive at the Bonferonni inequalities 


(17.5) do (-1'Ti(@,9) < 2(#,y) < D1? Ti(a,9) 


Sieving for twin primes 177 


for any £ € Z>1, where T;(z,y) = a eeake No(x;p1--+ pj). We rewrite 
these inequalities in terms of the Mobius function as 


(17.6) Y> M@Na(23d) < m(a,y) < YO wld) Nala; a), 
d|P(y) d|P(y) 
w(d)<2e-1 w(d)<2é 


We must choose @ so that the following two requirements are met: 


e £ must be small enough, so that the upper and lower bounds in 
(17.6) have a lot fewer terms than the right-hand side of (L7.2). 
This will allow us to estimate z2(, y) for y much larger than log z. 

e £ must be large enough, so that the lower and upper bounds in 
17.6) are close to the real size of 72(a, y). 


With the above requirements in mind, we note that (17.6) implies that 


m(x.y)= > w(d)No(a;d)+O( S) No(asd)), 
d|P(y) d| P(y) 
w(d)<2l—-1 w(d)=20 
Since No(x;d) = x-v2(d)/d+O(v2(d)) by (£73), as well as v2(d) < 4° when 
d is square-free with w(d) < 2@, we infer that 


(17.7) mo(z,y) =2 Ss" marl) +0 > Af 4 s =). 


d|P(y) d|P(y) d|P(y) 
w(d)<2é-1 w(d)<2é w(d)=20 


We must choose @ large enough so that 


(17.8) S- MOO TT (1-2) = 5 TT (1-<). 


d|P(y) pSy 3<pxy 
w(d)<2l-1 
Theorem [6.1] implies that, when weighted with «’(”), a “random” inte- 
ger n tends to have ~ «loglogn prime factors. Motivated by this fact, we 
will eventually choose @ = clog log y for a large enough constant c. To prove 
that holds for such a choice, we start is observing the ae 


(17.9) S- eee - Zi) x H(d)v2(d) Weald 


d|P(y) psy d|P(y 
w(d)<20—-1 se 
which are analogous to (27.6). (In fact, they follow from (17.6) with x = 
P(y), because we then have No(2z; d)/x = 7G )/dand uty y)/x = TI,<y(1- 
V2(p)/p). See also Exercise [I a ee and , we find that 


(17.10) a(a,y) )=e]I (1 _ ni) «of > ie > a) 


PSy d|P(y d|P(y 
eee w(d)= Ui 


178 17. Twin primes 


Since v2(2) = 1 and 12(p) = 2 for p > 3, Mertens’ third estimate (Theo- 
rem [3.4(c)) implies that the main term of has size < a/(log y)?. 

The first remainder term in (17.10) controls how many summands there 
are in the truncated version of the inclusion-exclusion principle (17.6). To 
bound it, we simply note that if d|P(y) and w(d) < 20, then d < y?*. Hence 


St al < aly? 
d|P(y) 
w(d)<2é 


This is small compared to the main term if (2y)? < x/(logz)°, say. 


Finally, the second remainder term in (17.10) measures how close the 
upper and lower bounds in (17.6) are. It should thus become small when @ 
becomes large enough. Indeed, we have 


Afy Aey, Afy 1\ 2¢ 
ae) eae pe © Glide p) 


d|P(y) Pi<<p2e<y Hs py 


by rearranging the 2¢ primes in all (22)! possible ways. The right side of 
(I7.11) is = x - (logy)? -P(Z = 20), where Z is a Poisson random variable 
of parameter \ = re 2/p ~ 2loglogy. Since Z has mean value and is 
concentrated around its mean value with high probability (see Exercise[L.9), 
we can make P(Z = 2¢) as small as we want by letting the ratio ¢/)7.,<, 1/p 
be large enough. More concretely, using the inequality n! > n”/e", we have 


S° Be cee € Lin<y L/P “2 Zt 
d t (log y)3 


for € > 3.99) ,<y 1/p ~ 3.99 log logy. In addition, we must have (2y)%* < 
a/(log x)>. Such an £ exists as long as y < a!/(8!8!8”) and « is large enough. 
To summarize, we have proved that 

He 2 

(17.12) ma(a,y) = {1+ O(1/logy)}5 T] (1 a =) 
3<psy . 

when 2 < y < a!/(8loslog*) and x is large. As an immediate corollary, we 
have the following remarkable result due to Brun. 


Theorem 17.1. For x > 2, we have 


ax(log log x)? 
(log x)? 


1/p converges. 


T(t) < 


In particular, the series ))y 542 twin primes 
? 


The Cramér-Granville model 179 


Proof. The first part follows from ([71) and ((7.12) with y = x!/@losles) 
For the second part, we use partial summation. Alternatively, note that if 
Pz denotes the set of primes p such that p+ 2 is also prime, then we have 
ypePan(2i,21+1) 1/p < 2-Im9(2I+1) < (log j)*/j? by the first part. Summing 
this inequality over all 7 proves the convergence of eR 1/p. 


Remark 17.2. The value of the series in the statement of Theorem 
is called Brun’s constant and its numerical calculation has an interesting 
history, as it led to the discovery of a bug in Intel’s® PentiumTM micropro- 
cessor by Nicely (see [184]). 


The Cramér-Granville model 


It is important to take a moment and understand the quality of our bound 
on 79(xz). Namely, we want to understand what the expected size of 72(z) 
is and how this compares to the estimate 72(r) < «(log log x)?/(log x)?. 


To answer these questions, we go back to Cramér’s model. Recall the 
basic set-up: (X;,)n>1 is a sequence of independent Bernoulli random vari- 
ables such that P(X, = 1) = 1/logn for n > 3. This sequence is presumed 
to model the indicator function of the primes. 


Consider now the random variable H2(x) = )¢,<, XnXn+42 as a ran- 
dom model of 72(). A straightforward calculation reveals that E[I2(x)] ~ 
z/ log? x as x — oo, thus suggesting that 72(x) ~ x/ log? «. However, as we 
mentioned when we originally introduced [Cramér’s modell (see page [4), the 
random variables X,, are insensitive to arithmetic information. We should 
thus be careful when using them because they may lead us to false con- 
clusions. For example, the same argument as above also suggests that the 
number of n < x such that both n and n+ 1 are primes is ~ a/ log? La 
conclusion that is blatantly false. 


To get around this issue, we modify Cramér’s model following an idea 
due to Granville. To capture the arithmetic structure of primes modulo small 
integers, our new model will consist of random variables (Y;,,)°°., supported 
on N = {n > y? : (n,P(y)) = 1}, where y is a large parameter to be 
chosen later. Theorem implies that NM contains approximately a := 
I],<y(1 — 1/p) proportion of N. Since an integer n in NV is presieved with 
all primes < y, its chances of being prime are ~ a~!/logn > 1/logn. 


In view of the above discussion, we define the Cramér-Granville model 
to be a sequence of independent Bernoulli random variables (Y;,)°, with 


(17.13) POY, =1) =o *1yen/ loen: 


180 17. Twin primes 


The corresponding model for the number of twin primes up to 2 is [Tp (a) = 
aes YnYn+2, for which we have 


‘ l(n(n+2),P(y))=1 
B [To (a . 
0 x, lo log(n Tost n+ 2) 


If y < x!/@loglog*) then ((712) and partial summation imply that 


ie] ~ Toga HI (1 val) (4 ie 


Pp p 


The product over primes converges absolutely as y — oo, since its factors 
are 1+ O(1/p?). Letting y — oo leads us to conjecture that 


Ta(x) ~ c2° Toes?” where cQ= “Lt (1 — =) (1 — =) . 


The constant cg is called the twin prime constant. 


In view of the above discussion, our bound on z2() is off by a fac- 
tor of (logloga)?. To remove this extra factor, we must find more effi- 
cient versions of (17.6), where the parameter y is allowed to be even larger 
than x!/8leglog*) while still being able to control the total error after in- 
serting (17.3). Doing so is a delicate task that requires a good under- 
standing of which integers d we can discard from the formula 72(z,y) = 
aP(y) H(@)Na(2; d) without losing too much information. In turn, this re- 
lies on A good grasp of the distribution of multiplicative functions that we 
studied in Chapters [15] and [I6] We note, however, that we will not be able 
to obtain a non-trivial lower bound on 72(z,y) when y = /x 4+ 2, which is 
what would be required to settle the twin prime conjecture. 


Exercises 


Exercise 17.1 (The Bonferonni inequalities). Let A,,..., A, be subsets of a finite 
set X. If A= AL N---M Ag, then show that 


|AJ=|X]— $2 |Ag,|4--++ (-1)" S- lAg, M*4 Ag [+ A, 
1<ki<k 1<ki<-+<k,-<k 
for all r € Zyo, where (—1)"*1A, > 0. [Hint: Show the identity |A] = |X| — 


Ss IA, \Ug, <e<p Ae| by dividing the elements of X \ A according to the largest 
index ky such that a € A,,. Then iterate this identity. 


Exercise 17.2. Let 2 denote the completely multiplicative function with i2(p) = 
V2(p), and define a probability measure on the set S(y) of y-smooth integers by 


z V2(p) (mn) 
KA =|. 0-= 


PSY neA 


Exercises 181 


If Ag = {n € S(y): dln} with d|P(y), prove P(Ag) = v2(d)/d and deduce (17.9). 
(Hint: The Bonferonni inequalities have a measure-theoretic version. ] 

Exercise 17.3. Adapt Brun’s method to prove the following estimates: 

(a) If m is v!/(4l08l08~) smooth, then 


y(m) 


#{n<a:(n,m)=1}~a-—— (% — oo). 


m 
(b) #{a-—y<p<a}< yloglogy/logy for x >y >3. 

(c) #{n<a:n?+4+1 is prime} < rloglogz/logz. 

Exercise 17.4. Let h = (hi,...,h,) be a k-tuple of distinct integers. For each 


prime p, define ,(p) to be the number of congruence classes mod p occupied by 
the numbers hj, ..., hy, and set 


6(h) := I (1 = (1 =) 


Pp p 


(a) Show that G(h) is an absolutely convergent Euler product. 


(b) The k-tuple h is called admissible if y,(p) < p for all primes p. Show that this 
is equivalent to having G(h) > 0. 


(c) If h is an admissible k-tuple, the Hardy-Littlewood conjecture states that 


(17.14) #{n<a:n+hy,...,n+ hx are all primes} ~ G(h) - iat 


when x — oo. Use the Cramér-Granville model to justify this conjecture. 


Exercise 17.5. Let N € Z y,. Use the Cramér-Granville model to predict an 
asymptotic formula for the number of pairs of primes (p,q) such that p+ q = 2N. 


Exercise 17.6 (Montgomery’s conjecture). Use a suitable version of the Cramér- 
Granville model to argue that, for each fixed ¢ > 0, we have 


T(x3q,a mee) x (x/q)\/? 
(2; q, @) gy CR (x/q)/”) 


uniformly for « > q > 1 and (a,q) = 1. In particular, 
x 


T(x; q,a) ~ ——— 
(7145) (q) log x 


if x and q tend to infinity at a rate such that q < a!~©. 


Chapter 18 


The axioms of sieve 
theory 


The general problem in sieve theory asks for bounds on the quantity 
S(A,P) = {a EA: (a, P) = 1}, 


where A is a finite set of integers, P is a finite set of primes, and the notation 
(a,P) =1 means that a has no prime factors from P. 


It is convenient to generalize further this set-up. Given a sequence of 
weights A = (an)°@, C Rso with S7P°., an < 00, we define 


S(A,P)= > an. 


(n,P)=1 


This incorporates the quantity #{ a € A: (a,P) = 1} by taking ay = 1,4(n). 
We will switch back and forth between the two definitions, using the more 
general one when discussing theoretical aspects of the sieve, and the more 
specialized one when discussing concrete applications. This ambiguity in 
the notation helps us avoid the introduction of unnecessary new symbols. 


Various important questions can be written in the above language. 


Example 18.1. If A = {m-—y < n < x} for some x > y+1 > 2 and 
P = {p< Vx}, then $(A,P) counts primes in the interval (x — y, x]. If, 
for instance, x = N? and y = 2N for some N € Zs3, then proving that 
S(A,P) > 0 is equivalent to Landau’s conjecture that there is always a 
prime number between (N — 1)? and N?. 


182 


18. The axioms of sieve theory 183 


Example 18.2. If A= {n(n+2):l<n<a}andP={p< Vxr+2}, 
then S(.A,P) counts integers n € (x + 2,2] such that both n and n+2 are 
prime numbers, that is to say, (n,n + 2) is a pair of twin primes. 


Example 18.3. Generalizing the above examples, let A= { f(n): a-—y< 
n < ax}, where x > y > 1 and f is a polynomial over Z. Assume further 
that f = fi---f,-, where fi, ..., f, are irreducible polynomials over Z 
(i.e., they are primitive and irreducible over Q), and let P = {p < z}, 
where z = max{|f;(n)|/2:2-y<n<2,1<j<r}. (Note that z~ 
cx4/2 when « — oo, where d = max 1<j<r deg(f;) and c is some appropriate 
positive constant.) Then S(A,P) counts integers n € (x — y,z] such that 
fi(n),..., f-(n) are all primes > z. Since f;(n) < n*8(F), any such n must 
satisfy the inequalities 7 > n > r4/(24') where d! = mini<;<r deg(f;). Note 
that for this range to be non-empty we must have that d! > d/2. 

For instance, if f(7) = 7+1, y= 2 and P = {p< Vx? + 1}, then prov- 
ing that S(A,P) > 0 for infinitely many values of x would imply Landau’s 
conjecture that there are infinitely many primes of the form n? + 1. 


Example 18.4. We can also count twin primes using an alternative set-up: 
we take A={p+2:p<ax}andP={p< Vx+ 2}, so that S(A,P) counts 
primes p < x such that p+ 2 is a prime > /x +2. As we will see later on, 
this alternative formulation yields better results on twin primes. 


Example 18.5. If A= {2N—p:p< N}and P = {p< V2N} for some 
integer N > 2, then S(A,P) counts primes p < N such that 2N — p is also 
prime. In particular, S(A,P) > 0 if and only if we can write 2N as the 
sum of two primes (the smallest one of which we take to be p). Proving this 
statement for all N > 2 is Goldbach’s conjecture. 


Example 18.6. If A={p-—1:p<a}andP={p' <a: p' =3(mod4)}, 
then S(A,P) counts primes p < x such that p— 1 has no prime factors 
= 3(mod 4). In particular, p— 1 can be written as the sums of two squares. 


Using a trick due to Iwaniec, we can reduce the size of primes in P: 
we take A = {p—1: p< 2,p = 3(mod8)} and P={p < Vr:p= 
3(mod4) }. We claim that all primes p counted by $(A,P) are such that 
p —1 is the sum of two squares. It suffices to prove that p — 1 has no 
prime factors that are 3(mod4). Note that (p — 1)/2 = 1(mod4). Hence, 
the number (p — 1)/2 is divisible by an even number of primes = 3 (mod 4) 
(counted with multiplicity). But p—1< a —1 can have at most one prime 
factor > ,/x. We thus conclude that if p = 3(mod8) and (p—1,P) =1 
then p — 1 has no prime factors = 3(mod4). In particular, p — 1 can be 
written as the sum of two squares. 


Typically, we study $(A,P) in a general axiomatic framework. We 
introduce and discuss each of the three sieve axioms in the following sections. 


184 18. The axioms of sieve theory 


Axiom [I} Generalizing the Kubilius model 


By Mobius inversion, we have 


(18.1) S(A,P) = So an S> wld) = S° wd) Aa, 

n — d\(n,P) d\P 
where the notation d|P means that d||],,-pp (ie., d is square-free and all 
of its prime factors lie in P), and 


(18.2) Aqi= ys Qn = S° Gam: 


n=0 (mod d) 
In the important case when we are sieving a set A instead of a sequence, we 
have 
Aqg=#{a€A:a=0(modd)}. 
In order to proceed further, we must estimate Ag asymptotically. We work 
out such an estimate in each of the examples discussed above: 


Example 18.1} Here A= {x -—y<n< x}, so Ag=y/d+O(1). 


Example We have A= {n(n +2):n< a}. Thus, relation (17.3) 
implies that Ag = x©-v2(d)/d+O(v2(d)), where v2(d) counts the roots of the 
polynomial x(x + 2) mod d. 


Example [18.3} Since A = {f(n): a2-y<n< a} wither >y ol 
and f(x) € Z[x], a straightforward generalization of (17.3) implies that 
Aq =y-v5(d)/d+ O(vz(d)) with v¢(d) = #{n € Z/dZ: f(n) = 0(modd)}. 


Example [I8.4 Here we have A = {p +2: p < x}, so that Ag = 
m(x;d,—2). If d is even, then Ag = 1. On the other hand, if d is odd and 
< (log x)° for some fixed C > 0, the Siegel-Walfisz theorem (Theorem [12.1) 
implies that Ag = li(x) /y(d)+Oc(xe~°V"*), where c is an absolute positive 
constant. Note that if we assume the Generalized Riemann Hypothesis, 
Exercise [11.2] implies the improved estimate Ag = li(x)/y(d) + O(,/z log x) 
for odd d. 


Example [[8.5} Here we have A = {2N —p: p< N}. Consequently, 
Aq = 7(N;d,2N). When (d,2N) > 1, any prime p = 2N (modd) must di- 
vide 2N, whence Ag < w(2N) « log(2N). On the other hand, if (d,2N) = 1 
with d < (log N)°, we have the estimate Ag = li(N)/y(d)+Oc(Ne~°V"2 9), 


Example In the second part, we took A= {p-—1l:p<u2,p= 
3(mod8)} and P = {p’ < Va: p’ = 3(mod4)}. Therefore, Ag = 
m(x; 8d, aq) whenever d|P, where aq is determined by the Chinese Remain- 
der Theorem and the congruences ag = 1(modd) and ag = 3(mod8). We 
thus conclude that Ag = li(x)/y(8d) + Og(xe~°V""8*) for d < (log x)© with 
d|P. 


Axiom[I} Generalizing the Kubilius model 185 


Observe that in all of the above examples there is a quantity X anda 
multiplicative function v such that 
d 
(18.3) Aq ~ a -X for all small enough d|P. 
We summarize in the following table the values of X and v for each of our 
six examples: 


Example | X | v(d) Example | X v(d) 
m0 byt i iy | laa-d/o@) 
182] | x | 12(d) TSS) | WN) | Leann: d/¢(d) 
18.3 y | vs(d) 18.6] | li(x)/4 d/p(d) 
Note that 
(18.4) vip)<p forall peP 


in each of Examples[18.1}{18.6] (with the possible exception of Example [18.3] 
if there is a prime p such that x? — | f(a) over the finite field F, = Z/pZ). 
Relation means that A, is asymptotically smaller than X ~ Aj, which 
we certainly need if we are to extract elements a, of the sequence A with n 
having no prime factors in P. 


We denote the remainder term in the approximation (18.3) by 
Tq:= Ag —X- v(d)/d. 
Since d|P in all summands of (18.1), we only need to control rg when d|P. 


We thus arrive at the first axiom of sieve theory. 


Axiom 1. There is a multiplicative function v, a parameter X and a se- 
quence of remainders (rq)qjp such that 


d 
4g = Dx 4 rg for all d|P 


and 
v(p)<p forall peP. 


In the spirit of the Kubilius model of the integers, the function v(d)/d 
can be interpreted as a multiplicative density function that we denote by 


(18.5) a(d) = a € [0,1] for d|P. 


Indeed, if we employ N with the probability measure 
P(R) = neon 
asl an 
then the event Eg = {n € N: d\n} occurs with probability 
P(Eq) = Aa/Ai ~ 6(d) 


186 18. The axioms of sieve theory 


when (18.3) holds with sufficiently small remainder rg. Hence, our as- 
sumption that v is multiplicative means that the events (Ep)pep are quasi- 
independent. This leads us to guess that 


(18.6) S(A,P) ~ X [[-4()) =x [I (1 7 e). 

pEP pEP P 
The same relation can also be seen by replacing Ag by 6(d)X + rq in (18.1) 
and ignoring all the remainder terms. 


In Theorem [2.1] we saw that the above guess is true when A = {n < 
x} and P C [1,logz]. This theorem will be improved and generalized in 
the next chapter thus establishing when P c [1,X°] for various 
sequences A. On the other hand, relation ([8.6) fails when A = {n < x} 
and P = {p< \/z}, as we discussed in Remark [3.5] This reflects the failure 
of the independence hypothesis for the divisibility by large primes: indeed, 
if p1 > p2 > p3 > «/°, then there is no integer n < x that is simultaneously 
divisible by pi, p2 and p3. In particular, Ep, A Ep, Ep, = 0, which means 
that the events E,,, Ep, Ep, are interrelated. 


Axiom [2} The sifting dimension 


A very useful and intuitive way to think of the quantity v(p) is as the number 
of residue classes we must “remove/sieve out” modulo p in order to capture 
elements of our sequence A that are primes (or products of a few primes). 
For instance, in Example [18.2] we want to make both n and n+ 2 to be 
primes. Hence, we must sieve out all integers lying in the congruence classes 
0 (mod p) and —2(modp). Correspondingly, we have 


1 ifp=2, 


(18.7) Hp) = #((0(uody) v2 (aoa) = {2 nae 


Similarly, consider the set-up of Example with f(z) = 2?7+1. In 
order to capture prime values of this polynomial, we must remove from the 
set {n < x} all integers that lie in a congruence class a(modp) such that 
a? + 1=0(modp) for some prime p < Vx? + 1. We then find that 


1 ifp=2, 
(18.8) Vip) = <2 if p=1(mod4), 
0 ifp=3(mod4). 
Naturally, the larger v(p) gets the harder it is to estimate S(A,P). A 


simple way of ensuring that v(p) does not get too large is to assume the 
existence of a parameter & > 0 such that 


(18.9) v(p)<k forall peP. 


Axiom[2} The sifting dimension 187 
For each of our six examples, we have: 
Example | v(p) k; | Example v(p) k 
18.1 1 1 lps2-p/(p—1) | 3/2 
iy |e thx! 2 TS5) | 1,en-p/(p—1) | 3/2 
18.3) | (ff) | deg(f) 18.6 p/(p-1) [3/2 


where in Example we used the fact that a polynomial of degree d has 
< d roots over the finite field F,, and in Example that min P = 3. 


As we remarked above, the smaller k is, the easier it is to estimate 
S(A,P). With this in mind, note that in Examples we may take 
k ~ 1 when p is large enough. However, in Example the inequality 
Vp(f) < deg(f) is sharp in full generality, because a polynomial f(x) factors 
completely mod p for a positive proportion of the primes by the Chebotarev 
Density Theorem (see Theorems 8.3 (p. 47) and 13.4 (p. 545)], as well 
as relation above for a concrete example). 

It turns out that for many applications we only need an averaged form 
of that allows us to reduce the value of k. There are various ways of 
averaging (18.9). A very useful one is the following. 


Axiom 2. There are constants k > 0 and C > 0 such that 


fs) ae) 


pEPN(y1,y2] 


uniformly for 3/2 < y1 < y2 < maxP. 


Axiom[lis often called the Iwaniec condition || the infimum of the values 
of « satisfying it is called the sifting dimension of A with respect to the set 
of primes P. Note that if holds and there is some e€ € (0, 1] such that 
v(p)/p < 1-e for all p € P, then AxiomPJholds with « = k and C = C(k,¢) 
by Mertens’ third estimate (Theorem B.4{c)). 


For each of our six examples, we have: 


where r is the number of irreducible factors of f over Q, and in Example 


Example| v(p) |« Example v(p) Ki 
18.1 1 {1 Ips2-p/(p—l) | 1 
[18.2] | 2—1p=2 | 2 18.5) | lpen-p/(p—1)]| 1 
[T8.3] Up(f) |r 18.6 p/(p — 1) 1/2 


18.6] we have k = 1/2 because P C {p = 3(mod4)}. 


Often, we must assume a more precise version of Axiom [2] 


1 Often, the Iwaniec condition refers to a slightly weaker version of Axiom[2] where the factor 


1+C/log yi is replaced by some absolute constant CO’. 


188 18. The axioms of sieve theory 


Axiom 2’. There are constants «, k > 0 and « € (0, 1] such that 
] 
re ayes? =kKlogw+O(1) for all w < maxP 
pEP[1,w] 


and 
v(p) < min{(l—e¢)p,k} for allpe P. 


It is easy to show that Axiom[2implies AxiomP] for some C' = C(e,k, «). 


Axiom [3} The level of distribution of A 


In order for Axiom [I] to be meaningful, we must be able to show that the 
quantities rq are small compared to the alleged main term X - v(d)/d. In 
practice, we only need to show such an estimate on average. The precise 
condition that we will need is the following. 


Axiom 3. There are constants A > 0 and m € N, and a quantity D > 1 


such that 
xX 


S- tm(d)|ral < =. 
d<D, d|P (log X) 


In Chapter [19] we will appeal to this axiom with m = 1, whereas in 
Chapter 21] we will use it with m = 3. The quantity D is called the level of 
distribution of the sequence A. It is a measure of how well we can control 
the distribution of A among the progressions 0 (mod d). 


Example 18.7. When A = { f(n) : 2—-—y <n < x} for a polynomial 
f(x) € Z[a], as in Example above (which incorporates Examples 
and [18.2] as well), then rq = O(vy(d)). Since ve(p) < deg(f) =: k, we find 
that rq = O(7,(d)) for d\P, so that 


S > tm(d)\ral < S> tm(d)Te(d) Kkm D + (log D)*"™ 


d<D, d|P d<D 
by Theorem [14.2] Recalling that X = y here, Axiom B] holds with 
(18.10) Dee, i/o. 


Example 18.8. In Example [18.4] we noted that rg = Oo(re~°V"8") for 
d < (logx)°%. This allows us to verify Axiom B] with D = (logx)° for any 
fixed C > 0. However, this is a much smaller level of distribution than the 
one we obtained in relation (18.10). This poses a serious hurdle if we want 
to study twin primes using the set-up of Example [18.4] 

On the other hand, if we assume the Generalized Riemann Hypothe- 
sis, we have rg = O(,/zlogax). We may thus verify Axiom 2] with D = 
Jaz /(logx)4+™—-1, Remarkably, Bombieri and A. I. Vinogradov 


The fundamental lemma of sieve theory 189 


proved unconditionally (i.e., without the assumption of any unproven hy- 
potheses) that we have a level of distribution that is almost as strong. 


Theorem 18.9 (The Bombieri-Vinogradov theorem). Fir A > 0. Fora > 
and1<Q<2a'/?/(logx)4+3, we have 


li(y) x 
18.11 max max |7(y;q,a <A : 
aes 49) ~ Og) (log )4*1 


O YS@ (a,q)=1 


qs 


This landmark result yields AxiomBlwith m = 1 and D x \/z/(log x)4+3 
in Examples [18.4] and [18.6] and with m = 1 and D x VN/(log N)4*3 in 
Example We will prove it in Chapter 26] 


Remark 18.10. It is believed that the Bombieri-Vinogradov theorem can 
be extended significantly. More precisely, Elliott and Halberstam con- 
jectured the following improvement. 


The Elliott-Halberstam conjecture. Fiz A,e > 0. Relation (18.11) 
holds uniformly for x >2 and1<Q<a'-*. 


The Elliott-Halberstam conjecture is very deep, going well beyond the 
reach of the Generalized Riemann Hypothesis. Among other things, it im- 
plies that the level of distribution is D = x!~* in Examples [8-4] and [18-6] 
and D = N!~* in Example[[8.5] Partial results towards it have been proven 
by Bombieri, Iwaniec, Fouvry, Friedlander and Zhang [IIHI3}/49}H52)/188}. 
On the other hand, Friedlander, Granville, Hildebrand and Maier 
disproved when Q = 2/exp(A(1 —)(log log x)?/ log log log x) build- 
ing on the earlier work of Maier that we will discuss in Chapter [30] 


The fundamental lemma of sieve theory 


Assuming Axioms [I}3] our goal is to substitute the exact identity 


P)=S- u(d)Aa 


d|P 

by upper and lower bounds 

(18:12) S> u(d)Aa < S(A,P) < S> wld) Aa, 
dEeQg- degt 


where Y* are certain subsets of {d|P} for which both sides of (18.12) can 
be bounded asymptotically. We will accomplish this goal in Chapter [19] by 
extending and improving the ideas of Brun presented in Chapter [17] 


es . by Axiom [Jin (18.12), we find that 


pa + > uld)ra < S(A,P) < ci LS at 


dEQg- dEQ- dEegr dEgr 


190 18. The axioms of sieve theory 


In order to be able to apply Axiom [3] and estimate the sum of the 
remainder terms, we must assume that Y* C {d|P, d < D}. On the other 
hand, the sets Y* must be chosen in a a that 


(18.13) S> uldv\a) yao uld =|] (1 = oy. 


de Qe d|P peP 


Let y = maxP, so that P C {p < y}. Since the sets Y* can only contain 
ae D, considerations based on Theorems [16.3] and [16.4] suggest that 
can be accomplished as long as the ratio log D/ log y is large enough. 
ae athe theorem confirms this heuristic. 


Theorem 18.11 (The Fundamental Lemma of Sieve Theory). Consider A 
and P satisfying Axioms [I] and 2] for some k, C > 0. Set y = maxP and 
Unqg = 1+ 2/(e953/* — 1), and note that 1 < ug <14+3.8K. 


(a) Uniformly for u> 1, we have 


S(A,P) = (1+ On,c(u-™/?)) yx T] (1- vp vt) + ys rl). 
peP P d<y", d|P 
(b) Assume Axiom] with m =1, A=K+1 and D> y"*. Iflog X > logy 
and D,X are large enough in terms of « and C, then 


aM (-22) <oun<T] (2) 


peP pEP 


We will prove Theorem[18.11in Chapter[I9} In part (b) of its statement, 
the crucial quantity is D!/“* because it determines the maximum size of 
primes we can sieve with. To get a sense of the quality of our result when « 
and D vary, we discuss the case of twin primes. 


Example 18.12. Recall that we have two set-ups for getting our hands 
on twin primes. In the first set-up, given in Example we have & = 2 
and D = x!~), so that Di/Me = zl/ue-e1) ~ z!/7-59. On the other 
hand, in Example [84] we have « = 1 and D = x'/2-°), so that D!/"2 = 
x95/m—1) ~~ ¢l/7-72. Hence, the first set-up allows us to sieve with larger 
primes. However, when «& = 1, it is possible to establish a version of Theorem 
[TS.1i(b) valid for y < D'/?-©, which becomes y < x!/4-© in the set-up of 
Example (see Chapter 12]). On the contrary, the best known 
version of Theorem when « = 2 is valid for y S$ D'/4?664, which 
becomes y < a!/+2664 in the set-up of Example [[82] (see Theorem 6.1 


or [34)). 
Corollary 18.13. For x > 2, we have 
q(x) = #{p<ax:p+2 is prime} « «/(logz)? 


Exercises 191 


and 
#{p <a: Q(pt+2) <7} > a#/(loga)?. 
Proof. For the first part, note that 
(a) < S(A,P) + O(2/7*) 


with A = {p+2:p <2} and P = {p < x'/78}. By our analysis of 
Example we may apply Theorem [18.1i{b) with X = li(x), v(d) = 
1(a,2)=14/p(d) and level of distribution D = /x/(logx)'°° (assuming the 
Bombieri- Vinogradov theorem). Consequently, 


SAP) = (2) TT (1 Oh) II (1 -) <p 


pEP 3<p<al/78 


by Mertens’ third estimate. The claimed upper bound on 72(z) then follows. 


For the second part, we may assume that x is large enough. We have 
ic 
Sas O42) <7) S84, P) > ——: 
#{p <x: Q(p+2) <7} > S(A,P) > flees? 
Indeed, if p < x is counted by S(A,P), then all prime factors of p + 2 are 
> ¢/78_ But an integer < x + 2 can have at most seven prime factors 


> «'/78_ This completes the proof. 


Remark 18.14. Chen [24\/25] proved that the second part of Corollary[18.13] 
is true with the number 7 replaced by 2. A proof of this result that comes 
remarkably close to the twin prime conjecture is presented in Section 
25.6] and in Chapter 11]. 


Exercises 


Exercise 18.1. For 2 > y > 3, use Theorem[I8.1]] to prove: 
(a) If m is 2'/“-smooth, then 


#{n <a: (n,m) =1} = (14+ O(e-™)) - xy(m)/m. 


(e) If hh = (hy,..., hx) is a fixed admissible k-tuple, then 
#{n<a:nt+hy,...,n+h,x are all primes} <p x/(logx)*. 


(f) #{p <a:p—1is the sum of two squares } < a/(log x)?/?. 


Chapter 19 


The Fundamental 
Lemma of Sieve Theory 


In Chapter we saw how Brun used some simple facts about the 
inclusion-exclusion principle to obtain upper and lower bounds for 72(z, z). 
In the present chapter, we will generalize and improve these bounds with 
our end goal being to establish the Fundamental Lemma of Sieve Theory. 


Given an integer n and a set of primes P, let us write P~(n) to denote 
the smallest prime factor of n from the set P with the convention that 
P-(n) = 1 if (n,P) = 1. Given any sequence A = (a,)?2, C Reso with 
Den>1 An < 00, we have 


S(A,P) = » dm =~ an— >) ~ An 


(n,P)=1 n21 picP P-(n)=p1 
(19.1) =Soam- >> >) apm. 
n>1 picP P-(m)>p1 


This formula is called Buchstab’s identity and its importance is that it allows 
us to perform inclusion-exclusion one step at a time. To see why it is true, 
note that if n > 1 is such that (n,P) > 1, then there is a unique p; € P 
with P~(n) = pi. Equivalently, pijn and P~(n/p1) > pi. Setting n = mp, 
completes the proof of (19.1). 

Recall the notation Ag defined in (18.2). The first term on the right side 
of equals A;, so it can be estimated using Axiom [B] Next, we want 
to estimate the double sum over p; and m in (19.1). For each fixed p; € P, 
we are asking for a bound on S(A,,,P M [2,p1)), where Ap, = (@p,m)?°=1- 
Getting such a bound might be impossible for certain p;. For instance, if 


192 


19. The Fundamental Lemma of Sieve Theory 193 


pi is bigger than the level of distribution D, then we cannot say anything 
meaningful about > 7,51 @pim = Ap,- For this reason, we will discard certain 
“inconvenient” primes p;. In general, given any set II; C P, we have the 


upper bound 
S(A, P) -S > vee: 


piel P-(m)>p1 


We now iterate the above argument: applying Buchstab’s identity (19.1) 
with PN (2, pi) in place of P, and with (ap,m)°°_, in place of (a,)?2,, yields 


S Apym = Ap, — 5 5 Api pom: 


P-(m)2>p1 p2EPN|2,p1) P~(m)>pe2 


Hence, our upper bound for S(.A,P) can be rewritten as 


S(A,P) <Ai- So Apt SESS SS apipem: 


piel p2<p1_ _ P-(m)>pe 
pi ctl, p2eP 


The “unknown” rightmost sum has non-negative weight now, so we cannot 
drop any potentially inconvenient terms from it. We rewrite it using a new 
application of Buchstab’s identity (19.1), this time with P/M [2,p2) as our 
set of primes, and with (@p,pym)?°_, in place of (a,)°°,. Thus 


S(A,P) — So Ant SOSS Anis 


piell, p2<Pp1 
piel, p2eP 


~ ay a Apipepsm: 


p3<p2<p1_ _ P-(m)>p3 
pict, p2,p3E€P 


We may now choose any set 3 C P x P x P and obtain an upper bound: 


S(A,P) < Ai — > Ap, + o> Apipo 


piel p2<p1 
piel, p2EeP 
_ S S S S Ap, p2p3m- 


p3<p2<p1 P-(m)>p3 
piclh, (p1,p2,p3)€ls 


Continuing this way, we find that, given any choice of sets II2j;-1 © 
Pi-!, § € Zs1, we have the general upper bound 


(19.2) S(A,P) < S> pld)Aa, 
dEegr 


where 


We _ neem PL >t > Pry pj © P for all 9, 
ae = = {dan Pr (py,...,p;) € Uy for all odd j <r ‘ 


194 19. The Fundamental Lemma of Sieve Theory 


Similarly, given any choice of sets 2; C P”’, j € Zs1, iterating Buch- 
stab’s identity and dropping certain terms at every even step, leads us to 
the lower bound 


(19.4) S(A,P) > SD pld)Aa, 
dEQ- 
where 


(19.5) 9 ={dapp: M> > Pr BSP oral \ 


(p1,--., pj) € Hl; for all even j <r 


Evidently, this construction offers a great deal of flexibility. For example, 
the choice Il; = P! for 7 < 2 and Il; = 0 for 7 > 2¢ corresponds to the 
Bonferonni inequalities that led to (17.5) (see also Exercise 17.2). This 
choice is often called Brun’s pure sieve. 


Generally speaking, the upper and lower bounds for S(.A, P) we obtained 
in and constitute part of the theory of the so-called combina- 
torial sieve. This is not the only way of producing bounds for S(A,P), as 
we will see in Chapter 


Brun’s sieve 


Brun introduced more sophisticated choices of sets I]; that can be motivated 
by considering what the prime factors of a typical integer look like. We 
present a variation of his argument below. 


Throughout, we let 
y=maxP and D=y". 
Recall that S(y) denotes the set of y-smooth numbers. In view of relation 


(18.13) and the discussion surrounding it, our goal is to choose the sets I]; 
in such a way that 


d)v(d d)v(d u(d)v(d)lap 
an) MAHL Md) _ 5 HOWLADLgp 
de gt a|P deS(y) 

while ensuring that = C [1, D]. 


If we assume that v(p)lpep ~ & on average (e.g. we assume Axiom [2’), 
d) 


then v(d)1qjp behaves similarly to «(4 on average. Now, let pi > po > 
--+ > pp be the prime factors of d in decreasing order. A variation of Theo- 
rem implies that, when we weigh d € S(y) with x’ /d, the sequence 
{log pi,...,logp,} typically decays exponentially with ratio of consecutive 
terms ¥ exp(—1/«). In addition, for the largest prime factor, we typically 
have logp; = logy. Hence, the typical asymptotic behavior of the prime 
factors of d is 


log pj © (logy) - e9/". 


Brun’s sieve 195 


Here we are weighing d with (d)v(d)1qp/d that has alternating signs, but 
we still expect a similar behavior for the prime factors of d: almost all the 
weight of > Jaes(y) H(d)v(d)1ayp/d should be supported on integers for which 
log p;/log y © exp(—cj/«) for some appropriate c. 


Motivated by the above discussion, we set 


(19.7) Tl; ={(pi,.-.,p;) € P2 spi > +++ > py, pj < y; }; 


where y; are certain cut-off parameters that decay doubly exponentially. 
Their precise definition is a bit technical: given an integer J > 0 and real 
numbers a € (0,1) and w € [2, y], we let 


y ifj < J, 
(19.8) Yoj—1 = Yoj = yo" if I <j <K, 
w otherwise, 


where K is the largest integer such that yo” > w. 


Note that y; = y for 7 < 2J, so that the sets I; do ca restrict the first 
2J prime factors of integers d € Y*. This will ensure when J — oo, 
provided that a is close enough to 1. In addition, note a the sets II; do 
not restrict the prime factors < w of integers d € Y*. This last condition 
is of a more technical nature and the reason why we insert it will become 
clearer later on (see relation (19.13) below). 


Choosing J, a and w appropriately, we prove: 


Theorem 19.1 (The Fundamental Lemma of Sieve Theory, II). Let « > 0, 
C>ly21,PC{p<y} and D=y" with u > uy = 1+ 2/(e%3/* — 1). 
If D is large enough in terms of & and C, then there are two arithmetic 
functions A* such that: 


(a) A*(1) =1, |A*| <1, supp(*) C {d|P:d< D}; 
(b) (L*A7)(n) < lm pyar < (1* AT) (n) for all n € N; 


(c) ifv is any multiplicative function such that 0 < v(p) < p for all p € P, 
and which satisfies Ariom 2] with parameters «x and C, then 


2 Ty (1-20) <3 52 vo (1-2) 


pep PL ap “IP ier 
and > NOW) = {1+Ox,0(u-*/?)} II (1-42) forX€{\t, NF. 
d|P pEeP 


Firstly, let us prove how this more technical form of the fundamental 
lemma allows us to deduce Theorem [18.11 


196 19. The Fundamental Lemma of Sieve Theory 


Proof of Theorem assuming Theorem Let us consider A+ 
as in the statement of Theorem[19.1] with P, y, «, C as in Theorem[18.1 and 
D=y". Since lwppya1 < (1*AT)(n), we have S(A,P) < 30, an Dale At (d). 
Interchanging the order of summation and applying Axiom [I] yields the up- 
per bound 


At (d)v(d 
(19.9) S(A,P)< De On + Rt, where Rt := » At (d)ra. 
Similarly, we have the lower bound 


(19.10) S(A,P)>x > Ou) +R, where Ro := > A” (d)ra- 
d d 


Since |A*| < 1 and supp(A*) C {d|P,d < D}, we have |R™| < Dap acp Ial- 
If we assume Axiom 3] with A = «+1 and m = 1, we thus have |R*| < 
X/(log X)**+!. If we further suppose that log X >> logy and that X is large 
enough, then Axiom 2]implies that |R*| < 10~1°°X Hep ( — v(p)/p). 


By the above discussion and Theorem[19.1] both parts of Theorem[18.11 
follow immediately when u > u,. It remains to prove part (a) when u < ux. 


Let z= DV“ <y and Pc, = PN [2,2]. We then have 


0< S(AP) < S(A, Pez) Kno X T] -v(p)/) 


PEP <z 


by the portion of part (a) already proven. In view of Axioms [I] and [2] we 
have 


pepe, (1 — u(p)/p) - 
~ [pep(— v(p)/p) > 
Hence, 0 < S(A,P) « X[[,<p(1 — v(p)/p), and Theorem [18.11{a) follows 


in this case too by assuming the implicit constant in its statement is large 
enough. 


(1+ C/log z) - (ux/u)” <.,c 1. 


Proof of Theorem Let y* = y*(K,C) be a large enough constant to 
be chosen later. If y < y*, we simply take A*(d) = p(d)-1gp. We then 


trivially have 
A* (d)v(d) v(p) 


d|P pEP 


In addition, any d in the support of A~ satisfies d < TTp<y* p < D provided 
that D is large enough. This proves the theorem in this case. 

Assume now that y > y*. Let ¢ = e(K,C) be a small enough constant, 
and let u* = u*(K,C,e) be a large enough constant, both to be chosen later. 


Brun’s sieve 197 


We then define A*(d) := lgeg+p(d) with the sets Il; given by (19.7) and 
the parameters y; satisfying (19.8) with 


elogy a =e 9-53001/e J — 0 ifu, <ucu', 
w= —, ; 
2 a=e/* J=[3u/10] ifu>u*. 


To check that A* satisfy condition (a), we must verify that J* Cc [1, D]. 
Ifd = p,---psdo € D” with py > po >--- > ps > w and do| Tpepaa,w] Ps 
then pj, ..., Poy41 Sy and poj41 < paz < yo; for all 7 > J+1. In addition, 
if y* is large enough, then our choice of w and the Prime Number Theorem 
imply that do < Lae? < y® for all y > y*. Hence, 


log d 2 log do 
logy ~ logy 


; » 
19.11 S149 poi ho. 
( ) +2J+14+2 S° a e+ 2J-+1+ ~~ 


; 1 
jeJt+l 

The right side of (19.11) is < u if u > u* and u* is large enough, because 
1/(a~' — 1) ~ u/5 in this case. In addition, the right side of (19.11) is < u 
if wu € [u,,u*] and < is small enough, by the definition of u,. Hence, there 
are choices of u* and € such that Z~ C [1, D]. 

Similarly, if d = pi---psdo € D* with p, > po >--: > ps > w and 
do| Teprpa,w}]P, then pi, ..-, pas < y, p2j < paj—-1 < yaj-1 for all j > J +1, 
and dp < y®. Arguing as above, we infer that 

log d 
log y 


<e+2J+2 .s ai-4 <u, 

jeJt+l 
provided that ¢ and u* are chosen appropriately. As a consequence, we also 
have Yt Cc [1, D]. This establishes condition (a). 


To check that (b) is satisfied, we follow the argument leading to (19.2) 
and (19.4), but this time starting from the indicator version of Buchstab’s 
identity that reads 


(19.12) Linpjat =1— SO lpn: Lp-(n/pr) 21 
pEP 


Alternatively, we may simply note that (b) follows by applying (19.2) and 
(19.4) to the sequence A defined by az = lpn. 


It remains to prove that the functions A+ satisfy condition (c). To this 


end, set 
d)v(d V 
v= 5 OO. TT (1-22), 


d|PN[2,z) pEP, p<z : 


as well as 
0.53001 ifu,<u<u*, 


5k /u ite a". 


B =-—-kKloga= 


198 19. The Fundamental Lemma of Sieve Theory 


Since v(p) < p for all p € P, we have V(z) > 0 for all z. In addition, since 
Yoj—-n = max{y””, w} for 7 > J and he {0,1}, Axiom 2] implies that 


PEPN(y25-ny] 
(19.13) < exp((j —J)B +e), 


since we may assume that y* is large enough so that w = 0.5< logy > e°/< 


for all y > y*. We remark that (£9.13) also implies that 
(19.14) ye i) <U-JS)Bt+e (j > J, he {0,1}), 
PEPN(yaj—nsy] 


as it can be seen using the inequality v(p)/p < —log(1 — v(p)/p). 


Now, let us consider d|P such that d ¢ Yt. If we write d = p,--- pr, 
then there is a unique integer 7 > J such that 


(19.15) poj—-1 > Yoj-1 and = pog-1 K Yor-1 (I KE < J). 
Hence, there is a unique way to write 
d =pi-- -poj—1d’, 


where d’ | [pepatzpo;—1) p and the primes pj, ..., p2j—1 are a strictly decreas- 


ing sequence of elements of P satisfying (19.15). Since u(d) = —p(d’) and v 
is multiplicative, we conclude that 


(19.16) V(y) x no _ > Vaj-1; 


d|P j>J 
where we have set 
VY (er ere VY 
P1***Pm 


Ym <Pm <0 <prsy 
P1y-+;PmEP 
pi<yi (i<m, i=m (mod 2)) 


Similarly, we have 


(19.17) V(y)- >> Mate = S Vay. 


d|P j>d 


Next, we fix an integer m > 2J and proceed to the estimation of Vj. 
Note that 0 < V(pm) < V(Ym) for Pm > Ym. Since the function v is non- 
negative, we infer that 


0<Vin<V(ym) = D Y(pr) VP) 


Ym <Pm<i< pi <y Pl Pm 


Ply PmEP 


Brun’s sieve 199 


By rearranging the primes pj, ..., Pm in all possible m! ways, we find that 


Vn < Vm) MOD Hm) ¢ Vm) (>) 
ml! Pi‘ **Pm m! Pp 
PiyPmEP Ym] PEPA(Ym 


Writing m = 27 —h with h € {0,1} and j > J, and applying (19.13) and 


(19.14), we arrive at the inequality 
(j-J)B+e((7 _ J 2j—-h 

e j-J)B +e) 
Vaj_-n < V(y) wt =a 7 


If we let 7 = J+ and sum the above inequality over all @ > 1, we find that 


asd apala (6) a) ee 


7 -n< 
(19.18) 2 Vain V(y) > (27+ 20— fh)! 


For the first part of the theorem, note that 6@+e¢ < 0.530022 as long 
as u* is large enough and ¢ is small enough. In particular, the summands 
on the right-hand side of are decreasing as functions of J, and we 
deduce that 

jos Vairh Z =, eh antes Banda 2” Ale 11/10? ifh =O, 

V(y) (2€—h)! ~ 13.9 ifh =1, 


where the last inequality is verified numerically. Together with (19.16) and 
(19.17), this completes the proof of the first part of condition (c). 


For the second part of (c), we may assume that ¢ < « and u > u*, with 
u* large enough so that 8 = 5«/u < min{1/6,¢/2} and J = [3u/10| > 2¢/8. 
Since n! > (n/e)” for all n € Zo, we have 


ypu Vaj-h Z 3 eBlte (Bp 4 c)2I+2l-h Z Heer ee) \ 
Vi(y) (27 +2@-h)! ~*~ 23 +22—h , 


In addition, noticing that e!+9/? < e!3/ < 3 and that Bl+e < 2max{ Be, e}, 
we find that 


l=1 


é=1 


e9/2+1(B8 + €) = J8e/T<K/u if <e/8, 
2J+2/-—h ~)38=15K/u  ifl>e/f. 


In any case, the right-hand side is < 15«/u. Assuming that u* > 30K as we 
may, we conclude that 


ve, ps Vp! as 
Hat h < Seay < (15«/u)?7 <K yo? 


(=1 
This completes the proof of the theorem. 


200 19. The Fundamental Lemma of Sieve Theory 


Sieve weights 


A careful reexamination of the proof of the Fundamental Lemma of Sieve 
Theory reveals that its most crucial component is the construction of the 
arithmetic functions A* from Theorem [19.1] These functions replace the 
exact Mobius inversion formula 


(19.19) linpya1= >> ud) 
d|(n,P) 


with an upper and a lower bound of the form 


(19.20) S>A7(4) < lap) < SOA). 
d|n d\n 

In general, a function A* that is supported on {d|P : d < D} and that 
satisfies the right inequality of for all n is called an upper bound sieve 
of level D for the set of primes P. We then write AT € At(D,P). Similarly, 
an arithmetic function A~ : N > R that is supported on {d|P :d< D} and 
that satisfies the left inequality of for all n is called a lower bound 
sieve of level D for the set of primes P, and we write A~ € A-(D,P). 

Given any choice of sets I]; C P’, the functions A+(d) = lgeg+p(d) with 
JY defined by (19.3) and (19.5) are in the classes A*(D,P). Indeed, this 
assertion follows from relation (£9.12) and the discussion surrounding it. 


All sieves A* € A*(D,P) yield bounds for S(A,P) as per (19.9) and 
(19.10). A good choice of A* should have the additional property that the 
upper bound in and the lower bound in are as close to each 
other as possible. This roughly means that the convolutions w+ = (1*A*)(n) 
behave on average similarly to 1(,,p)=1. 


The above point of view of sieve methods will be very useful when study- 
ing gaps between primes in Chapters [28] and where our goal will be to 
construct a sieve weight w, that correlates strongly with many of the inte- 
gers n,n+1,...,2+ H being prime. In particular, the non-negativity of 
the sieve weights 1 * AT will be crucial. 


Sifting limits and the beta sieve 


There is a construction of sieve weights that yields a version of Theorem 
18.11] with a smaller constant in place of u,: given a parameter {, we let 


(19.21) Tj ={(pi,pe,..-,p;) € P2 spi > +++ > py, pies Dj < D/py }. 


The upper and lower bound sieves produced from these sets I; are together 
called the beta sieve. It was introduced by Rosser and was fully developed 
by Iwaniec. To explain why we choose the sets II; in this specific way, we 
must introduce the concept of the sifting limit. 


Sifting limits and the beta sieve 201 


Given a dimension x, let 2, be the infimum of all numbers 6 such that 


(19.22) S(A,P) > X]] (1 = v0) 

pEP P 
whenever the pair (A,P) satisfies Axioms [IH3] the second one with dimen- 
sion « and the third one with level of distribution D > (max P)°. That is 
to say, 6, is the infimum of all numbers with which we can replace u, in 
Theorem [18.11] and still have a version of the lower bound there. 


Now, given a pair (A,P) as above and some primes p; > --- > p; from 
the set P, we have 


v(d)_v(pr-+-Py) 


Ap, -pjd = d Pie Dj X + Tp--pjd 
whenever d| [pepntz,p,) P: We thus see that Axiom [I] holds for the se- 
quence Ap,...p; ‘= (Qp;..pjm)m=1 and the set of primes P/M [2,p;) with 


X TP_, v(pi)/pi in place of X, Tp,--p;d i place of rg and with the same 
multiplicative density v(d)/d. It is reasonable to expect that Ap,..p,; has 
level of distribution D/(p1--+p;) (for instance, consider the case when A = 
{x-—y<n<zx«}). Hence, if we assume that 6 > 6,, then 


S(Api-pys P.O 2.p))) > X J] (1 - “to hee. Dtpreph) x 
pEP ? 
P<DP; 


by our hypothesis (19.22). Assume, now, we want to construct a lower bound 
sieve for S(A,P) using iterations of Buchstab’s identity (£9.1) as explained 
earlier in this chapter: we have 


S(A,P) = Ai — S> S(Ap,, P91 [2,p1)) 


piecP 
=Ar- SY) Apt SID) S(ApivasP 2 (2,22). 
picP P1,p2EP, p2<p1 


Since we cannot control the terms with D/(pip2) < ps , we drop them and 
set 


Tz = { (p1,p2) € P? : pi > pr, pips < D/p5 }. 
Continuing as above, and dropping each time the terms with D/(p; --+p;) < 
p’, we arrive at the choice (19.21) for the sets Is. 


d 
Notice that our hypothesis that relation holds when maxP < 
D*/® is fed into itself, thus becoming a “self-fulfilling prophecy”. This is 
a typical feature of sieve-theoretic functions, whose asymptotic behavior is 
often ruled by delay differential equations. We already saw this phenomenon 
in the study of smooth and rough numbers. In the case of the beta sieve, 


202 19. The Fundamental Lemma of Sieve Theory 


Iwaniec proved that choosing the sets I]; by (19.21) for an appropriate value 
of 6 leads to the inequalities 


5(A,P) 
Ter Tawa) 


under Axioms[1H3] where P C [1, y], u = log D/ log y and the functions f and 

F are the solutions to the following system of delay differential equations: 
u’F(u)=A ifu<B+l, | (u®F(u))! =xcu* 1 f(u-1) ifu>6+1, 
vfMmHB ifas< £, (u®f(u))’ =Ku*-!F(u—1) ifu> Zp 


for certain parameters A and B. In particular, we have 


< F(u)+o0(1) (D,X > oo) 


A>1l, B>o0, @=1 when k < 1/2, 
A = 2(e/n)/?, B=0, B=1 when « = 1/2, 
A>1l, B=0, 1<B<2 when « > 1/2, 
A = 2e7, B=0, @=2 when « = 1, 


A & 21.7484437308, B=0, 6 + 4.8339865967 when « = 2, 
where the calculation of A and £ in the last two lines is due to S. Blight. 


A comprehensive discussion of the beta sieve can be found in Chapter 
11 of the book by Friedlander and Iwaniec [59]. In particular, Section 11.19 
there gives numerical approximations for A and (£ for more values of kK. 


Exercises 


Exercise 19.1. Given a finite set of primes and multiplicative density function 
6:N— (0, 1], we set 
Vs(P) = [[ A - 4(p)). 
pEP 
If, in addition, d(p) < 1 for all p € P, we define the relative density function 


(19.23) o*(q) = Il i — for all q|P. 
plq 


(a) If \ is an arithmetic function supported on {q|P}, prove that 
S~ X(q)4(a) = Va(P) S© 5*(q)(1 * A)( 
a|P a|P 


(b) If A*¥ € A*(D,P) and 6), 52 are two multiplicative functions with 0 < d1(p) < 
62(p) < 1 for all p € P, then prove the monotonicity principles 


Dap > ~ (q)2(q) a al? A~ (d)61(q) 
V5. (P) * V5, (P) 


and ae A* (q)62(q) . ale AT (q 
) = 


IN 


Exercises 203 


Exercise 19.2. It is often easier to construct upper bound sieves rather than lower 
bound ones. This exercise shows how to pass from a collection of upper bound sieves 
to a lower bound sieve. 


Consider a number D > 1 and a set of primes P. Suppose that for each prime 
p € P we are given a sieve 43 € AT(D/p, PN [2,p)). Show that the function 


1 ifd=1, 
A (d) = 4 -At(d/p) if dP, d>1, p= P-(d), 
0 otherwise. 


is a lower bound sieve of level D for the set of primes P. 


Exercise 19.3* (The Brun-Hooley sieve). This exercise develops a variation of 
Brun’s pure sieve that leads to results of the same strength as Theorem [18.11] 
Throughout, A and P satisfy Axioms [I}3] with sifting dimension x, level of distri- 
bution D and A = «+1. In addition, y = maxP and u = log D/logy, with u 
assumed to be large enough in terms of & and C. 


(a) If P = Gina 1 Pr is a partition of a set of primes P, then prove that 


lin,P)= is <TI yo u(d,) 


r=1_ d,|Pr 
w(d,)<20,. 


for any choice of integers £,.. 


(b) Fixe > Oand A> 1. Set y, = y» ” and let R be the biggest integer such that 
yr > eC/*. Then define Pr = PN [2, yx] as well as P, = PN (Yr41, yr] when 
l<r<R-1. Finally, set 


Gf ={d=d,::-dp:d,|P,, w(d-) < 26, (1<r<R-1)} 
with ¢,=|r(u—uo)(1 — 1/A)?/2|, where wo =log([] ep, 
the function \* (d) =1geg+p(d) is in the class At(D,P). 
Let v be the function from Axiom [I] and set 


d)v(d d)v(d 
wie YS = v= MOM 


d|P,, w(d)< 20, d|P,,. 


p)/ logy. Prove that 


— 
fe) 
YN 


and E, = V,* — V,, with the convention that €r = oo. Prove that 


v(d) - («log \ + €)?rt1 
d ~~ (2é, +1)! 


O< FE, < 
d|P,., w(d)=20,.+1 


forr=1,...,R—-1. 


Prove that 


— 
pa 


R 
[lv -I[ve= yh V,-1E,V,4.4+-+Va_ Vr < Se°Vy +++ Ve 
with S = 027" £,./V,. Conclude that 


S(AP) < (14 One((u+ 1"? +1/ log. X))X T] (1-22), 


peP P 


204 19. The Fundamental Lemma of Sieve Theory 


(e) Use Buchstab’s identity to show that 


S(A,P) > (1- Ox,0(u-"/? + 1/ (log X)°) px T] (1 “o)). 


pEP 


Exercise 19.4% Assume the notation and assumptions of Theorem In par- 
ticular, A+ are the sieve weights constructed in its proof, y= maxP and D = y". 
All implied constants below may depend on the parameters « and C. 


(a) Let f : N > C be an arithmetic function for which there is a multiplicative 
function v as in Theorem c) and some S$ > 0 such that 


3 MORE) < su(m) II (1-2) 


d|PN[2,z) pEP, p<z P 


for all m| mee and all z € [1,y]. For \ € {At, 7}, prove that 


ye ee ea 5") 


d|P d|P 


(b) Let v be as in Theorem [19.1] Assume further there is some k > 0 such that 
0<v(p) <k for allpe P. For \€ ae + and r EN, prove that 


a ae ae) 


d|P d|P peP P 


with the implied constant depending also on r and k. 
(c) Let v be as in Theorem [19.1] For \ € {At,A7~} and x > y, prove that 


y v(n)pr(n) _ (Ax I (nju(n)u*(n) o( = (1 = “ny. 


nNKx nxu pS 
(n,P)=1 p€P 


[Hint: Use part (a) with f(d) = ia <n/q¥(da)u*(da)/a, v*(d) = pia” v(p)/(1+ 
v(p)/p) in place of v, and S = [],<,(1 + »(p)/p) = [pce (1 — ¥*(p)/p)~*] 


Exercise 19.5* (A study of the beta sieve). Let P be a set of primes and y = 
1+maxP. Given D = y“ with u > 2, let A*(d) = lacg+p(d) be the beta sieve 
weights (i.e, J* are given by (19.3) and (19-5) with the sets II; given by (19.21). 
In addition, let v be a multiplicative function such that 0 < v(p) < min{p — 1, k} 
for all p € P. 

(a) Let m = 27 —h with h € {0,1}. Assume that p; > pp >--- are some primes in 


P such that pi ---Pm—ip24 > D and pi--+pn—ip2t! < D for all n < m with 
n = h(mod 2). Prove that 


i-1 


—(u- B=1 
Pip2***Pr2—n—-1 < Dy ( (43) (l<ix 


j); 
and deduce that pm > y°" with dm = uni (Got) — 4) 


Exercises 205 


(b) If V(z) = pep, p<z(l — v(p)/p) and 


= S AP Pm) Vip) 


yo” <pm<<pi<y 


prove that V(y) = lap H(d)v(d)/d and 
A (d)v(d At (d)v(d 
V(y) — $0 Va5 < So <P SS < V(y) + 50 Vaj-1. 
j21 d\|P d|P jel 
(c) For any ¢ € [0,1), use Rankin’s trick to prove that 


Vm < aw ( S- re Viyo™). 


yom <pXy 
(d) Show that there is a choice of 8 < 1+ 4k such that 


yo OH = + Oc)VLW). 


d|P 


[Hint: Choose ¢ = w/ logy as in the proof of Theorem [16.3]] 


(e) If u > 14 2/(e°-5295/* — 1) and all primes of P are large enough in terms of k, 
prove that 


yr ow), Vo) 

d AO 

d|P 

[Hint: Take « = 0 in part (c).] 
Assume that log D > c+ (1 + 2/(e9-5295/* — 1)) logy with c large enough in 
terms of k. Construct a sieve \* € A~(D,P) that satisfies the conclusion of 
part (e) even when P contains small primes. [Hint: Partition P = P| U Po, 
where P} = PN [1, yo] and Pz = PN (yo, y]. Any d|/P can be uniquely written 
as d = did2 with d,|P;. Take A*(d) = (di) (dz) with A~ a lower bound beta 
sieve of level D/e®° for P2.] 


— 
a) 
ae) 


Chapter 20 


Applications of sieve 
methods 


Sieve methods are a versatile tool that can be employed in a great variety 
of ways. We demonstrate their utility by presenting several results where 
they play a key role. 


Primes in short arithmetic progressions 


When y/q > x*, a strengthening of Montgomery’s conjecture (see Exercise 
17.6) states that 

a 
y(q) log x 
This statement is well beyond the reach of the Generalized Riemann Hy- 
pothesis, which is not sufficient to detect primes in (x — y, x] when y < Vz, 


nor primes p = a(modq) that are < q?. Nevertheless, we can use a sieve to 
prove an upper bound of the expected order of magnitude. 


(20.1) #{x-y<p<2#:p=a(modg)}~ (x > oo). 


Theorem 20.1 (The Brun-Titchmarsch inequality). Uniformly for q € N, 
a € (Z/qZ)* and x > y > 4q, we have 


:p=a(mo — 
#H{E—y<p<aipsal dD} < Toytog@y/a) 


Proof. When y < 10gq, the result follows trivially by the fact that there 
are < y/q +1 integers in the arithmetic progression a(modq) that also 
lie in the interval (a — y,a]. Let us now assume that y > 10q and set 
A={2r-y<n<a#:n=a(modq)} and P = {p < z} with z = (y/q)™“*. 


206 


The Titchmarsch-Linnik divisor problem 207 


Any prime p € (a — y,2] that is in the congruence class a(modq) is 
either < z, or is counted by S(A,P). Hence, 


(20.2) #{xr-—y<p<x2:p=a(modgq)}< S(A,P) +z. 


We estimate S(A,P) using Theorem [I8.1i{b). We must first verify Axioms 
We have 


Aq=#{t@-y<n<2:n=0(modd), n= a(modq) }. 


If (d,q) > 1, there are no integers n in the intersection of the congruence 
classes 0 (modd) and a(modq), because (a,q) = 1. We thus have Ag = 0 
when (d,q) > 1. Assume now that (d,q) = 1. For such integers d, the 
Chinese Remainder Theorem implies that there is a unique congruence class 
ag (mod qd) such that Ag = #{x@-y <n<x4:n=agq(modqd) }. Therefore, 
Aq = y/(qd)+r¢ with |rg| < 2. In conclusion, Axiom[]holds with X = y/q, 
v(d) = 1aqy=1 and |rqa| < 2. Axiom 2] then obviously holds with « = 1, 
and Axiom B] with m = 1, A = 2 and D = (y/q)/(log(y/q))?. We may thus 
apply Theorem [18.1i{b) and Mertens’ third estimate (Theorem [3.4{c)) to 
deduce that 

y ee 
(20.3) )<s II (- = Il (1--) 


log z 
tage. pia 08 PSz, plq ‘ 


The last product is < brig! —1/p)~! =q/v(q). This completes the proof. 


The Titchmarsch-Linnik divisor problem 


Two consecutive integers are always coprime. More generally, we expect 
their multiplicative structure to be more-or-less uncorrelated. Thus, even if 
p is a prime number, then the integer p — 1 should still have the anatomy 
of a “typical” integer as described in Theorem [16.1] except for obvious re- 
strictions such as the fact that p— 1 is even if p > 2, or that p— 1 cannot 
be congruent to 2(mod3) if p > 3. It is then reasonable to guess that 


1 = 
S te(p — 1) = pas Ste (2) <p x(log x)*-?, 


Titchmarsch studied the sum on the left-hand side when k = 2 and eval- 
uated it asymptotically under the assumption of the Generalized Riemann 
Hypothesis. Subsequently, Linnik removed this assumption, so that the fol- 
lowing result now holds unconditionally. 


Theorem 20.2. For x > 3, we have that 


- G(2)6(3) x log log x 
27 ne ¢(6) o( log x ). 


208 20. Applications of sieve methods 


Proof. We follow an argument due to Rodriguez . Note that 


n= > i=ho+2 YO L 


ab=n aln,a<vJ/n 
Therefore, 
Sor(p-1)=255 So 14+0(Vz) 
pKa PSL a</p—1 
a|p—1 


=2 S° (n(#;a,1) — x(a? + 1;a,1)) + O(v2) 
a<Va-1 


by interchanging the order of summation of a and p. We have the crude 
bound 7(a? + 1; a, 1) = O(a?/[p(a) log(2a)]) from the Brun-Titchmarsch in- 
equality. This bound is < 2!/4 when a < «!/4, whereas it is < (x!/?/ log x)- 
a/p(a) when a € (#!/4,x'/?]. As a consequence, 


1/2 
> (a? +1,a,1) Kat. git 4 2 gl? ge F 
log x log x 
a<Va«-1 


with the last estimate following by Theorem applied with f(a) = 
a/p(a). In addition, the Bombieri- Vinogradov theorem (Theorem [18.9) im- 


plies that 


p<Q 
with Q = /z/(log x)?. Consequently, 
ya p—1) 2 + S- m(z,a,1)+O | 
log x 
p<ax ao © Q<ax<Vzr—-1 


Now, applying Wirsing’s theorem (Theorem [14.3) with f(a) = a/p(a), we 
find that 


x 


1(x, a, 1) 


log x 


Fy = RO tos + 011) = SOP tog + O(log o82) 
ax<Q 


Since li(xz) = x/log x + O(x/ log? x), we conclude that 


¢(2)¢(3) x log log x 
j= 12 ,a,1)4 , 
YS" r(p ) (6) x > m(xz,a,1) +O ieee 
psa Q<a</e—1 
Finally, we bound crudely the sum over a € [Q, Vx — 1], taking advan- 
tage that this is a short interval if we rescale it logatitiinically (we have 


loga = logx + O(loglogxz) when a € [Q,V/x-—1]). Indeed, using the 


Multiplicative functions over short arithmetic progressions 209 


Brun-Titchmarsch inequality and Wirsing’s theorem once again with f(a) = 
a/(p(a), we have that 


x x log log x 
SS aean< » ye B 
y(a) log x log x 
Q<ax<Var-1 Q<ax<Va-1 
This completes the proof of the theorem. 


Multiplicative functions over short arithmetic progressions 


Our last application of sieve methods is an analogue of the Brun-Titchmarsch 
inequality for multiplicative functions that greatly generalizes Theorem[14.2 
It was proved by P. Shiu [165]. 


Theorem 20.3. Fir k € N ande > 0. Given any choice of qe N, a € 
(Z/qZ)*, real numbers x > y > 1 with y/q > x, and a multiplicative 


function f such that0 < f < T,, we have 
—1 
) (2) <ee “exp { ) flp)—1 \ 
q —. # 


T—Y<ngx 
n=a (mod q) pla 
Proof. All implied constants might depend on & and ¢ without further 
notice. We may assume that x is large enough in terms of them. We begin 
by showing a preliminary estimate. 
Set z = y/q € [x*, 2] and note that ))i/ucpce 1/p < logu + O(1) for 
u > 1. We thus infer the bound 


(20.4) S> l< II (1 «) < “exp { x=} 


X-Y<n<cx pxzllu pu 2 
n=b (mod q) ota pla 
Po (n)>zl/4 


uniformly for X > Y > y/z!/*, u > 4 and (b,q) = 1, as it can be seen by 
(20.3) applied with z!/“ and b in place of z and a, respectively. Let us now 
show how to deduce the general case of the theorem from (20.4). 


Call S the sum in the statement of the theorem. The rough idea is to 
fix some small parameter 6 > 0 and to decompose each integer n in the 
range of S as n = mm! with P+(m) < z° < P~(m’). (Such a decomposition 
always exists and is unique.) Anatomical considerations based on Theorems 
and [16.4] suggest that m < z!/2 for a “typical” n, as long as 6 is small 
enough. Fixing such an m (which must also be coprime to q), we see that 
m’ € (a/m—y/m,x/m]| and m! = am(modq), where ™ denotes the inverse 
of m(modq). In addition, since 0 < f < 7% and Q(m’) < loga/log(z*) = 
O(1/6) (see Exercise 2.9{e) for the last inequality), we have f(n) < f(m). 
Thus, for each fixed m < z!/2, we should be able to estimate the sum over 


210 20. Applications of sieve methods 


m’ using (20.4). The problem is that there are various “atypical” integers n 
for which m! > z!/2. Dealing with them creates various technicalities. We 
give the details below. 

First of all, since f(n) < t,(n) < n®/*, the contribution of integers 
< 21/2 to Sis < 21/2x8/4 < 23/4. We consider now an integer n > 2/2 in 
the range of S. We decompose it in its prime factors, say n = ig oe p;' with 
pi < po <-++: < pr. If we let ny = [];_, p;’, then there is a unique integer 
r € [1, R] such that n,-_1 < 2/2 < n,. We then write m = n-_1, p! = Dp, 
vy = vp and m’ = n/m, so that P~(m') = p’, Pt(m) < p’' and (p’)"m > 
Vz>m. Since 0 < f < | and Q(m’) < logz/logp! < log(z!/*)/log p', we 
have 


(20.5) f(n) = f(m)f(m’) < fm) ke 1982/loe 


Moreover, the relation mm’ = a(modq) and our assumption that (a,q) = 1 
imply that (m,q) = 1 and m’ = am (mod 4q). 


From the above discussion, we infer that 
S <5, + $2.4 53+ 0(2*/4), 


where S; is the part of S with p' > z!/4, Sq is the part with p’ < z!/4 and 
m > 2/4 and S3 is the part with m,p! < z!/4. Note that v > 2 in Ss, 
since 2(¥+))/4 > (p')”m > z!/? for its summands. For this reason, the main 
contribution to S comes from S$; and Sj. We estimate each sum individually 
below. 


To bound 51, we apply (20.5) and then (20.4) to find that 


Si< kl) S> f(m) S> 1< S* f(m) sft 


M/Z (a—y)/m<m'<a/m M<VJz expt) pce, pla 1/p} 
(m,q)=1 m!=am (mod q) (m,q)=1 
P-(m')>z1/4 


since z = y/q. We then use (14.7) to arrive at the estimate 


et 


PKu 
pia 


Sy < Xz, where  := exp { 


Next, we estimate Sj. We introduce the checkpoints z; = 2’. There 
is a unique J € N such that zy41 < (logz)? < zy. For j < J, we let 
N; be those integers n in the range of Sz that also satisfy the inequality 
zj41 < Pt(m) < z;. Finally, we let Nz be the set of zj-smooth integers n 
in the range of Sy. We also write Sj; for the contribution of n € N; to Spo. 


Multiplicative functions over short arithmetic progressions 211 


-j-1 


First, we estimate S,; for 7 < J. Since p! > P*(m) > 2j41 = 27 in 
its range, an adaptation of the argument we used to bound Sj implies that 


Seog RPE SE fm) 


zV4cme/z (a—y)/m<m'<a/m 
(m,q)=1, Pt (m)<z; m!=am (mod q) 
PO (m)>2z541 
QWz/m 
eer ( j 


”) pl Cee, on PY 


2V4emeV/z 
(m,q)=1, Pt (m)<z; 


We then apply Theorem with f(™m)1(m,q)=1 in place of f to deduce that 
523K Az i; e/, 

In the sum 52,7, we do not have any precise information about the 
position of p’. We will thus estimate the sum over m/ yee The gains 


will come from the fact that m is zy-smooth, and here zy < (log z)®. More 
precisely, using the fact that f(n) < Th(n) < 20. Ol forn <a < z'/©, we have 


bye » 1 


al4emeiz «/m—y/m<m'<a/m 
P+(m)<(logz)& —m’=am (mod q) 


1 
z zi01 » = < y1.01-1/24+0(1) _ ign Ae) 


m>z/4 
P+(m)<(log z)® 


by Theorem We conclude that Sz < 4 S25 <K rz. 


Finally, it remains to bound $3. Note that (p’)’z!/4 > (p')”m > 21/2, so 
that (p')” > z'/4 in its range. Since we also have that p’ < z!/+, there must 
exist some integer ju > 2 such that 21/4 < (p')# < z'/?. Writing n = (p')#n’, 
and observing that p’ + q and that f(n) < t,(n) < 2°°! for n < x, we arrive 
at the estimate 


s<2y Y > 1 
p>2 zZi/4c(p!)M<zgl/2 (a—y) /(p!)#<n'<a/(p!)# 
p'tq n'=(p')#a (mod q) 
70. 01+11/12 


70-01 S. > z < <>. Ee ae K ziol-1/12 


ple 
M22 (p!)H>z1/4 M22 pl! 


by Rankin’s trick, since (p’)#/3 > z!/!? when (p')# > 21/4. We thus see that 
the contribution of 53 to S is negligible. 


Putting together the above estimates proves that S < Az, thus complet- 
ing the proof of the theorem. 


212 20. Applications of sieve methods 


Exercises 


Exercise 20.1. Let x = y". Given n € N, we write n, for its y-smooth part, that 
is to Say, Ny := [evingeg? Uniformly for all A C S(y), prove that 

Pr<a(ny € A) = Presiy)(n € A) + O(e"). 
Use the above relation to give an alternative proof of the Erdés-Kac theorem. 
Exercise 20.2° 
(a) For x > 2, prove that 


S > 13(p — 1) x xlogz. 


PKu 
[Hint: Show that a|njd<at/3 T(d) < 73(n) < 3 alnja<e2/2 T(d) when n < «| 


(b) Assume the Elliott-Halberstam conjecture. Prove that there is a constant c > 0 
such that 
S/ 13(p —1)=crlogr+O(x) (x21). 
pKa 
Exercise 20.3% Let f be a multiplicative function with 0 < f < Tx. 


(a) Adapt the proof of Shiu’s theorem to show that 
—2 
oy f(p—1) Xk xexp{ a. 
pKu pKx 7 


[Hint: You will need an estimate for #{n < x: P~(n(2an+1)) > y} uniformly 
ina€Nande>y21\ 


(b) Fix C > 10. Uniformly for 2 < k < Clog log, show that 
Vi. (log log x)* . 
(log «)?k:! 
[Hint: Use Chernoff’s inequality (a.k.a. Rankin’s trick).] 
(c) Show an estimate analogous to the one in part (a) for the sum 


S> of (p-1) 


e—y<pKu 
p=1+4+2a (mod 2q) 


#{p <a@:w(p—1) =k} Keo 


when (a(1+ 2a),q) =1 and y/q > x* for some fixed e > 0. 


Chapter 21 


Selberg’s sieve 


In 1947, Selberg introduced a different approach to sieving based on the 
simple fact that squares are non-negative. This allowed him to construct in 
one stroke a very general class of upper bound sieves often called A?-sieves. 

We start with a set of primes P and a function A : N > R that is 
supported on integers d|P and satisfies the condition A(1) = 1. Then 


(21.1) TeX (DG ie 


Indeed, if (n,P) = 1, then both sides of (21.1) equal 1; otherwise, the left 
side is 0 whereas the right one is non-negative. 


Opening the square in (21.1), we find that the right side equals (1 * 


AT)(n), where 
Yo Adi) A(d2) 


[di ,d2|=d 


Hence, \* is an upper bound sieve for the set of primes P. If, in addition, 
we assume that supp(A) C [1, VD], then supp(At) C [1, D]. We denote this 
special class of upper bound sieves A+ by A?(D,P). Lower bound sieves can 
also be obtained using Exercise [19.2 

We have thus produced a general class of sieve weights At for which the 
inequality (1 * AT)(n) > 1(n,py=1 is automatically satisfied. Optimizing the 
choice of At then becomes a calculus problem. Indeed, using Axiom [i] and 
the argument leading to sis we find that 


v([d1, da) 
(21.2) xo we +S xX r (di)X (dz) )rids, dz]: 
d;,d2 : d1,d2 


213 


214 21. Selberg’s sieve 


Ignoring the error term for now, we focus on minimizing the main term 


Gx ye y((di, da]) 


di o dy] 


This is a quadratic form in the variables A(d) with d < VD and d|P, under 
the restriction that A(1) = 

Selberg’s solution to this minimization problem was to diagonalize Q, 
since then finding the optimal choice of A becomes trivial. To motivate his 
argument, let us consider first the a case when v = 1. We then have 


(21.3) ox me a ve ) 5 MGA) (4. ay) 


did 
Fags 142 


A natural thing to do next is to set m = (d;,d2), so that dj = ma; with 
(a1, a2) = 1. We then find that 


A(may)A(maz) 
Q= BE Be ee 


(a1,a2)=1 


The problem is that the variables a, and a2 on the right-hand side are 
tangled via the condition (ai,a2) = 1. If we did not have this condition, the 
double sum would factor as a perfect square thus diagonalizing Q. 


We could replace the condition (a1,a2) = 1 using the MGébius inversion 
formula 1(q, a)=1 = ae u(d). Instead, we use a trick that untangles a; 
and ag in a simpler way: we go back to (21.3) and rewrite its right-hand 
side using the convolution identity 


(di,d2)= SY) vlm)= SD v(m). 


m|(di,d2) md, ,d2 


Together with the change of variables d; = maj, this implies that 


m A(mayz)A(ma m Mma) \? 
e= Oy ( ua 2) Od a ') 


a1,a2 


Setting €(m) = >>, A(ma)/a diagonalizes Q, which allows us to minimize it 
easily in terms of the new variables €. 


We now generalize the above idea to arbitrary functions v. First of all, 
note that we may assume that is supported on integers d|P’ with 


P! :={peEP:v(p) > 0}. 


This restriction is justified by simply observing that v({d1, d2]) = 0 whenever 
either d, or dg is a square-free integer with at least one prime factor from 


P\P 


21. Selberg’s sieve 215 


Now, for any choice of dj, d2|P’, the number [d1, da] is square-free. Hence, 


V([di,d2]) _ v(di)v(d2) (di,d2) __-v(di)v (da) . y*(m) 
(di, d2| dido v((d1, d2)) dido v(m)’ 


wherd]] 
(21.4) g*(m) =m][ (1 = to). 


We then find that 


m|d1,d2 


By oa) S DLO 


a 


where we let dj = maj for each 7. It is now clear that making the change of 
variables 


mn 
E(m) = Ineg Y, nae with Q:={d<VvD:dP’}. 
diagonalizes Q. We need to show that this is an invertible change of vari- 


ables, which we accomplish by an application of Mobius inversion. Since A 
is also supported on Y, for each d we have 


Sy  Mlonfadeden60m) 5m pay HEI 5 Mena 


me, d\m me, d\m 2 
es A(n)v(n) 
ee 5 2), Se a 
ne€J, d\n m: d|m|n 


Making the change of variables m = de in the innermost sum, we find that 
it equals ) 7 ein/q H(€) = In=a- We thus arrive at the inversion formula 


A(d)v(d m/d)u(m)E(m 
vd) _ 3 [el /On\ )&(m) 


(21.5) 
meQ, d\m 
This proves our claim about the invertibility of our change of variables. 
Recall our constraint (1) = 1 which, in view of (21.5), becomes 


Bis 5 Mmdutomgten) 


meg 


We have thus transformed our task to minimizing the quadratic form Q = 
Yin Y(m)y*(m)E(m)?/m? under condition (21.6), with € supported on J. 


1Note that v/y* = 6*, where 6* is the multiplicative function we saw in Exercise [19.1 


216 21. Selberg’s sieve 


We can solve the above minimization problem using Lagrange multipli- 
ers. Alternatively, we can use the Cauchy-Schwarz inequality: applying it 
with coefficients (v(m)y*(m))/?é(m)/m and Imegu(m)(v(m)/y*(m))*/?, 
we find that 


mMmI)YMN m ‘ 2(m)v(m 
1= (Sy LE cg yr Hemet, 


meg meg 


We know that the above inequality is an equality exactly when there is a 
constant DL #~ 0 such that 


(21.7) gm) =F Imeg HO 


for all m. To calculate the value of L, we use (21.6). This yields L = 
yimeg ¥(m)/y*(m), where we used that Y contains only square-free integers 
(so that 2(m) = 1 for each m € Y). The minimal value of Q is thus 


oil vim) 1 
7 aa 


meD 


The above calculations lead us to the following fundamental result. 


Theorem 21.1. Let A andP satisfy Aviom [IL If D >1 and —* is defined 
by (21.4), then 


X v(m) 
a w(d) = 
S(A,P) < 7 ~ ) oral with = ) 


d<D m<V/D 
d|P m|P 


Proof. Let € be defined by (21.7), where we recall that 9 = {d< JD: 
d|P’} with P’ = {p € PB: v(p) > 0}. Then, we define A(d) via relation 
(21.5) when d € ZY, whereas we set A(d) = 0 otherwise. We claim that 


(21.8) |A(d)| <1 whenever de Y. 


Before proving this inequality, let us see how it establishes the theorem. 


Indeed, the first term on the right-hand side of (21.2) equals X/L by the 
discussion preceding Theorem [21.1} whereas the second term is 
< dy ee IT(d;,do]| by virtue of (21.8). Given a square-free integer d, there 
are 3°) ways to write it as d = [d,, dz]. This completes the proof, assuming 


the validity of (21.8). 


21. Selberg’s sieve 217 


To prove (21.8), we first calculate A(d). For any d € JY, relation (21.5) 
and our choice of € = that 


22. wml eK m)&(m) 


7a me, d\m 
__d ae eae (m) 
~ Lv(d) J , m) 


mada w(d) __d a 
OL FO 2, oa) 


For each d € QY (that is necessarily square-free), we have the formula 
d/p*(d) = hae We thus find that 


a:dacD bla © 


The products ab with a and 0 as above are all distinct from each other, with 
each one of them determining a unique integer n € JY. Since v/y* > 0, we 
conclude that 


a:daeG bla © nEG e*( 


This proves our claim (21.8), thus completing the proof of the theorem. 


The sum L from the statement of Theorem[21.I|can be estimated asymp- 
totically using Theorem[[4.3]and Axiom[21 The vending bound for S(A, P) 


is given below. 


Theorem 21.2. Let A and P C {p < y}. Assume that Axioms 
and |3] hold, the second one with parameters k, k and c, and the third one 
with m = 3, A= «+1 and level of distribution D > y?. If, in addition, 
log X > logy, then 


S(A,P) < (X + Ox.ee(X/logy)) P(e +1)-e [] (1-* “P). 


pEP 


Proof. All implied constants might depend on «, k and «. By Theorem 
with D = y? and our assumptions on (A, ?), it suffices to show that 


oe é =| 
bis) ey = Te a II (1 - “te? + O((logy)*"*), 


* 
me<y ” (m) pEP 
m|P 


Indeed, since pep (1-4 (p)/p)* > (log y)* from Axiom[2] inverting (21.9) 
yields L~' = (e*7/T(«K +1) + O(1/logy)) II,ep(1 — v(p)/p) as needed. 


218 21. Selberg’s sieve 


To prove (21.9), we define the multiplicative function f by the relation 
my, _ ) lpep* lmai-¥(p)/(1—v(p)/p) ifp<y, 
f(p™) = a 
Tx (p™) ifp>y, 


where 7, is defined by (13.3). In particular, L = eer f(m)/m. We evalu- 
ate this sum using Theorem [14.3] whose conditions hold for f by Axiom [2] 
and Mertens’ estimates (Theorem B.4(c)). Hence, 


= fea Ty (1+ $4 Hs...) 1-2)" + ooguy, 


T'(« +1) D D 


The factors with p > y are all equal to 1. On the other hand, we have 


(ofA) 1-22) 


PSY pEP 


Using Mertens’ third estimate to evaluate [[,~, (1 — 1/p)* completes the 


proof of (21.9) and hence of the theorem. 


PSy 


As a direct corollary of Theorem with P = {p < V/z/(logx)”*}, we 
have an estimate for the number of prime values of an admissible k-tuple. 
(See Exercise [17.4] for the definition of an admissible k-tuple.) 


Corollary 21.3. Let h = (hi,..., hx) be an admissible k-tuple of distinct 
integers, and define vy(p) = #{ hj (modp):1<j <k}. Then 

G(h)x 

(log x)* 


#{n<a:n+hy,...,n+h, are all primes} < (2*k! +) 


with ¢ = Ox n(log log x/log x) and 


This result should be compared with the Hardy-Littlewood conjecture 
({7.14). In particular, taking h; = 0 and hg = 2, Corollary [21.3] implies 
that the number of twin primes is at most 8 times the expected amount. In 
Exercise 21.2] we will see an improvement of this result when k = 2. 


Our final application of Selberg’s sieve is an explicit version of the Brun- 
Titchmarsch inequality which shows that the number of primes < x in the 
progression a(modq) is at most 2 +e times the expected number when 
logx/logqg — oo. In Exercise we will see that improving this factor 
to 2 —€ would imply that there are no exceptional zeroes in Theorem [12.3 
However, this cannot be done using only Axioms and their variations, 
as we discuss after the proof of Theorem 


21. Selberg’s sieve 219 


Theorem 21.4 (The Brun-Titchmarsch inequality, II). Uniformly for q € 
N, a€ (Z/qZ)* and x > y > q, we have 

(2+e)y 
#{x-—y<p<x“4:p=a(modgq)} < ———_, 
: 3 9(q) log(2y/q) 


with € = O(log log(3y/q)/log(y/q)). 


Proof. There are two ways to obtain the claimed upper bound, both in- 
volving a trick to get the required uniformity in g. The more standard proof 
is quite similar to the proof of Theorem [20.1] and it is outlined in Exer- 
cise We present here an alternative proof that uses the monotonicity 
principle (see Exercise [19.1). 

Lett A={x-y<n<2:n=a(modgq)} and P = {p< z} with z to 
be determined. As in the proof of Theorem the pair (A,P) satisfies 
Axiom[I] with X = y/q, v(d) = 1(4,q)=1 and |ra| < 2. Hence, for any function 
X supported on square-free integers < z, implies that 


#{r-—y<p<x2:p=a(modgq)}<S(A,P) +2 


y (di) A(d2) 
< a > aaa) + 227 ||AI|5,5 
(dy d2,q)=1 sii 
where ||A||,, is the supremum norm of A. The sum over dj, dz in the main 
term can be written as At(m)1¢,q)=1 
m Fj 
m 


where A*(m) = Idi ,dgl=m A(d1)A(d2) is an upper bound sieve. Hence, 
Exercise [19.1{b) with 6)(m) = 1¢mq)=1/m and 62(m) = 1/m implies that 
A(di)A(dz) ( >. (di) (dz) 
0< ae sae hee facade, 
a ms [d1, dz] I] Pp a [d1, dz] 
(did2,q)=1 pX<z, piq dj,d2 
q (di) A(d2) 
cy Sees 
9(q) a [d1, do] 


We now choose the weights \ to be the optimal weights with respect to 
the function v(d) = 1 and the set of primes P = {p < z}, in which case 


||Alloo < 1 by (21.8) and 


since y* = y here. By Theorem[[Z3] we have )7,,<, u?(m)/p(m) = log z+ 
O(1). We thus conclude that 


log z + O(1)) 
Taking z = (y/q)!/*(log(2y/q))~1© completes the proof. 


+ O(z?). 


220 21. Selberg’s sieve 


The parity problem of sieve methods 


As we discussed above, improving the constant 2 in Theorem[21.4] would have 
the spectacular consequence of eliminating Landau-Siegel zeroes. However, 
Selberg proved that this is not possible using sieve methods and the mere 
assumption of Axioms[I}3]and their variations. To do so, he constructed sets 
A that satisfy Axioms and for which the true size of $(A,P) matches 
the upper bound provided by Theorem [21.2 


Indeed, let P = {p < Vz}, 
AY = {n<a@:A(n)isodd} and A ={n<«:Q(n) is even}. 
Note that 
AY? = 3 ; 


Nex 
d\n 


|x/d| i (—1)3+2@) > (12) 


: 2 m<a/d 
_ 2 | o(2,-cVwate/a) 
ag +? ( cl ) 


for some absolute constant c > 0, where the error is bounded using Exercise 
[8.4(d) and a convolution trick (i.e., we write (—1)° = yw * f). Theorem [13.2 
can also be used if we settle for a weaker error term. 

The above estimate implies that Axiom [lis satisfied with X = x/2 and 
v(d) = 1. In addition, Axiom 2] holds with « = k = 1, whereas Axiom B] 
holds with m = 3, A = 2 and D = 2/e(8 logz)? We may thus apply 
Theorem to conclude that 


(21.10) 0< S(AD, Vz) < (14 o(t) = (1 + 0(1)) 


log x 
as £—> OO. 


On the other hand, we can calculate $(A, \/z) directly. We know that 
any integer n > \/az counted by it must be prime. Therefore 


S(A, Ja) = a(x) + O(v2) 
as x — oo from the Prime Number Theorem, whereas 


5(A, Vz) = O(Vz) = o( —). 


log x 


~ log x 


So we see that, up to an error of size o(a/ log), the upper bound in (21.10) 
is sharp when j = 1, and the lower bound is sharp when j = 0. In particular, 
we cannot hope to improve upon unless we impose an extra axiom 
that eliminates the above examples. 


Exercises 221 


Because of the shape of Selberg’s extremal examples, the inability to 
improve upon under Axioms [I}Blis called the parity problem of sieve 
methods. The underlying reason that causes this obstruction is that the 
sieve weights we have constructed under Axioms [I}3] do not correlate with 
the Mobius function, so they cannot differentiate between integers with an 
even and an odd number of prime factors. 


In Chapter [23] we will present a method going back to Vinogradov that 
allows us to “break the parity barrier” for certain sequences (a,,)°2) using 
bilinear form methods. The book of Harman and the paper of Fried- 
lander and Iwaniec study such “parity-breaking sieves” in a much more 
systematic way. Let us briefly mention that the axiom imposed on A in 
concerns (roughly) bilinear sums of the form )?,<K cer anei(kl). Showing 
that there is cancellation in such sums means that the sequence A does not 
correlate with the Mobius function. 


Remark 21.5. In light of the above discussion, Chen’s theorem [24}/25] that 
there are infinitely many primes p such that p+ 2 is the product of at most 
two primes is the best possible result we can hope for using sieve methods 
and the mere assumption of Axioms 


Exercises 


Exercise 21.1. Let qe N and z 21. 


(a) Note that g/p(4) = Dag #2(4)/:o(d) and p2(m)/:p(m) = [pn (141 /P+1/p?+ 
+). Conclude that 


A ys Fo towt le) +. 


m<z,(m,q)=1 M<z MK<z 
(b) Use the above inequalities to prove that 
¥ 


ional) +22° +2 


#{x-—y<p<«2:p=a(modqg)}< 
t ( Ms 9(q) 
for all x > y > 1, and deduce Theorem 21.4] 


Exercise 21.2. Let h be an even integer, and let Lp, = >ip|n log p) /D- 

(a) Show that Ly < log(w(h)) + O(1). 

(b) Show that there are two absolute constants M, and Mp2 such that 

My, (Lp + log log 1) Adcox pat 


2 = 
(log x) dh en” 2 


#{p<a:p+hprime} < (1+ 
log x 


when x > e”24n, where cy denotes the twin prime constant as usual. [Hint: 
Use Exercise [14.7]| 


S| 


Chapter 22 


Sieving for zero-free 
regions 


Sieve methods provide an alternative way of establishing zero-free re- 
gions for Dirichlet series that are of the same strength as Theorems[8.3] and 
12.3] The idea is as follows: assume that we want to obtain a zero-free 
region for ¢ close to 1 + it, where |t| > 2. For any a € (0,1), we have 


1 
lo +it) =o + at) - f (a+ it)da 


Now, let 6 € (0,1) be such that the quantity M := supy_s<,<1|¢/(o + it)| 
satisfies the inequality 6M < |¢(1+ it)|/2. We then infer that 1/2 < |¢(o+ 
it)|/|¢(1 + it)| < 3/2 for o © [1 — 6,1]. Hence, we have reduced proving 
a zero-free region to an upper bound for ¢/(o + it) and a lower bound for 
¢(1+it), so that we can determine the largest 6 for which 6M < |¢(1+it)|/2. 


To bound ¢’, we argue as in Theorem[I1.2| but we need to be a bit more 
careful because the Dirichlet series representation of ¢ is not valid inside the 
critical strip. We apply the Euler-Maclaurin summation formula to obtain 
a generalization of (5.7): for each N € Zs, and for Re(s) > 1, we have 


N 1l-s 
@21) ¢s)= a+ a= yi s [eae 
n=1 


n>N 


Now, both ¢(s) and the rightmost expression in are well-defined for 
Re(s) > 0. In addition, they share the same singularities in this region (a 
simple pole of residue 1 at s = 1). Since they are equal for Re(s) > 1, they 
must also be equal for Re(s) > 0 by the identity principle. Differentiating 


222 


Sifted Dirichlet series 223 


yields the formula 


57 oan Ni =] NN *loeN °° fy}(1— slogy) 
ns (1—s)? l-s N yett 


('(s) = 


dy, 


valid for all Re(s) > 0 and all N € Zs;. We take N = ||t||, put absolute 
values everywhere and argue as in the proof of Lemma to find that 
¢'(s) = O(log? |t|) for o > 1—1/log|t| and |t| > 2. 

On the other hand, we have | log ¢(1+7t)| < log log |t} +O(1) by Exercise 
I8.4{c), whence |¢(1 + 7t)| >> 1/log |t|. This leads to a zero-free region of the 
form o > 1 — O(1/log® |t|) which is weaker than the one in Theorem 


To understand why we arrived at a weaker zero-free region, we must 
reexamine the above argument. Notice that to bound ¢’(s) we estimated 
trivially all the summands with n < |¢|. In fact, for the upper bound ¢’(s) = 
O(log? |t|) to be achieved, we must have that n” = 1 for all n < |t|. But then 
p’ = 1 for all primes p < |t|, so that Tete 4 — 1/p't#|-! = log |t|. This 
suggests we should be able to replace the lower bound |¢(1+7t)| >> 1/ log |¢| 
by the stronger estimate |¢(1 + it)| >> log |t|, which would then lead us to a 
zero-free region of the same strength as the one in Theorem 


Sifted Dirichlet series 


To get around the above issue, we introduce a truncated version of ¢. Instead 
of truncating in an archimedean way and considering the sum )7,,. y 1/n*, 
we truncate it multiplicatively and work with 


1 i\~ 
Gy (4) = SC ==TH(-5) : 
P-(n)>y p>y P 


This “multiplicatively truncated” version of ¢ has the advantage of possess- 
ing an Euler product representation. More generally, given a Dirichlet series 


F(s) = 0, f(n)/n8, we set 


(22.2) F,(s)= S> nan 


nm 
P-(n)>y 


When Fs) is a Dirichlet L-function, we have the following crucial estimate. 


Theorem 22.1. Let y be a Dirichlet character modq, s =o+it andy €R 
such that y > max{10,q(|t]}+1)} ando >1-—1/logy. If x is principal, we 
further assume that |t| > 1/logy. For j € {0,1}, we have 


(22.3) LY) (s,x) « (log y)’. 


224 22. Sieving for zero-free regions 


Before embarking on the proof of Theorem [22.1] let us see how we can 
use it to extract a zero-free region for L(s, x). Set 


(22.4) Qx,t = max{q(|t| + 1), 10, exp(1y=xo/é))}- 


When y > qy,t, we also have |t| > 1,—,,/ logy. Thus, Theorem 22.1]implies 
that L(o + it, x) < logy foro > 1—1/logy. As a consequence, 


Lyle + it, x) — 14 +4,.x) = / Li(a+ it, x)da < |1 — o| logy. 
1 


We thus infer that there is an absolute constant c > 0 such that 
__ Heyl + it, 9) 
log y 

In particular, we see that the size of |L,(1+ it, x)| controls the quality of the 
zero-free region we can obtain. Moreover, if the upper bound |L,(1+it, x)| < 
O(1) from Theorem is the true order of magnitude of |L,(1 + it, x)|, 
then we recover the zero-free region for L(s, x) given in Theorem [12.3 

We now prove Theorem After this task has been completed, we 
will see how we can control the size of L,(s, x). 


(22.5) [Lyle + it, x)| = |ig(l+a,%)| for |ja—1)< 


Sifted character sums 


The key to proving Theorem is the Fundamental Lemma of Sieve The- 
ory, which allows us to estimate character sums running over y-rough num- 
bers. Going from such an estimate to a bound for Ly(s, x) is then accom- 
plished by a routine partial summation argument. 


Lemma 22.2. Fort € R and x > y > max{q(|t| + 1), 10}, we have 


; gitit 1 gi—110/log y 
ni aoe PTY (1-3) 40S, 
Ss" x( ) X=X0 isa ll p logy 


N<Kxr 
P (n)>y 


Proof. Let A* be the sieve weights from Theorem with D-= «fa, 
P={p<y}and«=1. Set 6=At—A and note that 6 « 1 > 0, as well 
as that (AT * 1)(n) — (6 * 1)(n) < 1p-(n)sy < (AT * 1)(n). Consequently, 


(22.6) SD x(n = SO x(nyn-“#AF «1)(n) +O( O(*Y)). 
Nx nXx Nx 
P~(n)>y 
For the error term, we have 


So (6*1)(n) = S> d(m )|= |= x) At) +t O(a). 


nxn MKVJz MKVJE 


Sifted character sums 225 


Since log D/ log y = 0.5 log z/ log y here, Theorem yields the estimate 
(22.7) So6 *1)(n) < gi}10/logy / og y. 


NKx 


For the main term, we have 


S> x(n x(n —it eu * 1)( > A*(d d)d~** S> y(m)m-*. 
n<x d<Ja m<a/d 
We apply partial summation to remove the factor m~“ from the inner sum. 
Note that So ncy X() = ly=x0w¥(a)/¢ + O(q), by periodicity. Hence, 
1-i 


SE xlonjn = ya AP. P+ O((e| + Yatog(2)) 


m<w 


uniformly for w > 1. In turn, this implies that 
1—-it + 
ss e(q) «x A*(2)xo0(4) 

T x(n) #1) (2) = Nya. A VO 

(22.8) NKx a d<Ja 
+ O(/zxq(|t| + 1) logz). 

Since we have assumed that 2 > y > max{q(|t| +1), 10}1°°, we have 

Vzq({t| oe 1) logx < 9-51 Z gi 1t/log(10) < gi 110/log y_ 
In addition, note that 


ed) 5 a = (14 O(e- M8) TT (1 - =) 


q dx fa psy - 


by Theorem [19.1] since y > q here. Combining the above estimates with 


(22.6) and (22.7) completes the proof of the lemma. 


Proof of Theorem Let x and ¢ satisfy the hypotheses of the the- 

orem, and recall the definition of q,4 from (22.4). First, we prove (22.3) 

when y > ae and o € [1 — 100/ log y, 2]. From Lemma we know that 
1-it 


ao x 
(22.9) > Lp-(n)>yX(n)n-“ =a- <7 + R(x) for all x > y, 


N<Ka& 
where a = ly=yo [],<,(1 — 1/p) and R(x) = Ryty(x) « g}-110/ logy / Jog y. 
Hence, for all w € C with Re(w) > 1, partial summation implies that 


Lyw+it,x)=1+0 [> ee Ry) rw [hae 
y y 


qutit ye qutl 


(22.10) = = af ee Ww) +w fo BE a 
1 y 


wtit-—1 qed ye gut 


226 22. Sieving for zero-free regions 


In view of (22.9), the right-hand side of is meromorphic for Re(w) > 
1—110/logy. In addition, it has the same singularities as L,(w + it, x), 
i.e., a simple pole at w = 1 — it of residue a. Hence, must hold for 
Re(w) > 1—110/logy. Differentiating it 7 times and setting w = a, we 
infer that 


(IL) (8,0) = Lino + Gap af oe Auteuy 
°° R(x)[o(log x)F — j(log x)J—! 
+/ (x)[o( sibel ge) a 
y 


Since R(x) « x!—10/logy/logy and o > 1— 100/logy, the two terms 
involving R on the peas hand side of the above identity are < (logy). If 
x # x0, this proves because a = 0. Finally, if |t] > c,/logy, we note 
that a < 1/logy a —1| > |t| > c1/ logy, as well as [’ (log x)Ja~Sda « 
(logy). Putting together these estimates completes the proof of in 
this case as well. 

Finally, we prove when gyi < y < a and 0 > 1—1/logy. 
The case o > 2 is ea za the absolute convergence of LY)(s,y). Assume 
now that 1— 1/logy < o < 2 and let z = y?, so that z > ae and 
oa € [1 — 100/log z,2] Thus, the results we proved above apply wath z in 
place of y, that is to say, Lz(s,x) < 1 and L4(s,y) « logz. In addition, 
note that 


(22.11) |Ly(s,x)| = |£2(8,x)| [[ [1-x)/p*7? = |£2(8, x) «1, 
Y<pz 


where the product over p was bounded by observing that 1 —1/p < |1— 
x(p)/p*| < 1+ 1/p and then applying Mertens’ third estimate (Theorem 
[3.4(c)). Similarly, we have 


L, (8, x) = £,(s,x) II (1-2B\ 2 (8X) aD x(p™) log p ee 


ps 
Y<PKz y<p<z,m>1 
< |L2(s, x)| + |Ly(s, x)| log z < logy, 


where we used Mertens’ second and third estimates to obtain the first in- 
equality. This completes the proof of Theorem [22.1 


Pretentious multiplicative functions 


Relation reduces the proof of a zero-free region for L(s, x) to under- 
standing the size of Ly(1+ it, x). We shall accomplish the latter task using 
the theory of pretentious multiplicative functions. Our starting point is the 
following lemma. 


Pretentious multiplicative functions 227 


Lemma 22.3. Let f be a completely multiplicative function with |f| < 1 
and let F denote its Dirichlet series. For y > 2 anda > 1, we have 


log Fy(s -y4 are (1) with «= max{y,e/(°-Y}., 


pee 


Proof. Note that F,(s) = [],.,(1—f(p )/p’)—' by the complete multiplica- 
tivity of f. Since fl <1, we infer that 


(22.12) log Fy(s ery oe -y@. 


proymel1 pry 


Now, Chebyshev’s estimate and partial summation imply that 


1 
(22.13) So = «1 


p>el/(l-2) 


This proves the lemma when o > 1+ 1/logy (since x = y then). On the 
other hand, if o < 14 1/logy, so that o = 1+ 1/logz, then (22.12) and 
(22.13) imply that 


log Fy(s)= S- __ fle) _ soy, 


pit1/log x+it 
Y<pKu 


Finally, observe that p!/!°s* = 1+ O(log p/log x) for p < x, and recall that 
nee log p)/p <logax. This completes the proof of the lemma. 


For future reference, we record the following one-sided bound for the 
averages of y(p)/p'*”, which is obtained as a direct corollary of Theorem 
and Lemma [22.3 


Corollary 22.4. Let x be a Dirichlet character modq andt € R. Then 


—it 
yy Reel) <0(1) for all u> v > ayy. 


’ 


U<pKu " 
Lemma relates L,(1+it, x) to logarithmic averages of x(p)p~” and 
demonstrates that for L,(1+ it, x) to be small, the quantities y(p)p~“ must 


predominantly have negative real part. The most extreme case would be 
when y(p)p~” = —1 for most p, in which case we can think of y(n)n~” as 
“pretending to be” the Mo6bius function. To study more rigorously this type 


of arguments, we introduce the distance function 


Lf (p) — g(p)? 
(F503 U; ae 9 \ pe 


U<pxu 


228 22. Sieving for zero-free regions 


which is a variant of the distance function used in the proofs of Theorems 
[8.3] and Note that if |f(p)| = |g(p)| = 1 for all p € (u, v], then 


(22.14) (figure = So i= Aiea 


U<pKu 


Together with Lemma [22.3] this establishes the connection of Ly(s, x) to 
the distance function. 


With the above notation, we have the following significant strengthening 
of Corollary [22.4] that shows that there exists a parameter Y controlled by 
the size of L,(1+it, x) such that the average behavior of y(p)p~** undergoes 
a phase transition when p + Y: it is 0 on average when p > Y, whereas it 
is —1 on average when p< Y. 


Theorem 22.5. Let x be a Dirichlet character modq, t € R and y > qyt. 
There exists some Y = Y (x,t) € [y, +oo] such that 


(22.15) (x(n), w(n)n*; u,v) = O(1) when [u,v] C [y, Y) 
and 
(22.16) » Relator") =O(1) when [u,v] C [Y, +00). 


In fact, we have log Y/ logy = 1/|Ly(1 + it, x)|. 


Proof. We take Y = maxfy, y'/4vG+I1_ Since |Ly(1 + it, x)| < 1 from 
Theorem 22.1] we have that log Y/ logy = 1/|Ly(1 + it, x)|. 

To deal with the potential issue of having L(1 + it,y) = of we let 
Y, = y/HuGte+t0l for e > 0. In addition, let f-(n) = x(n)n-*—* and call 
F,,-(s) the associated sifted Dirichlet series (that equals Ly(s + € + it, x)). 


First, we prove a generalization of (22.16): we claim that 


(22.17) S> Reel?) _ O71) when [u,o] S [¥e,+00). 


U<pKvu £ 


(Taking « = 0 in (22.17) recovers (22.16).) We may assume that Y; < 00; 
otherwise, (22.17) is vacuous. For o > 1, Theorem implies that 


(22.18) Fye(o) = Fye(1) + / F! (a)da = Fye(1) + O((o — 1) logy). 

1 
Hence, if C is a large enough constant (independently of any parameter), we 
have that |Fy.<(o)| = |Fy,-(1)| when 0 < 0-1 < |Fy-(1)|/(Clogy). Since 
|Fy<(1)|/logy = 1/logY-, we infer that |Fy-(1+ 1/logu,x)| = |Fy.-(1 + 


1We are assuming we have no knowledge about the zeroes of L(s, x). 


Pretentious multiplicative functions 229 


1/logv,x)| for v > u > YY. Taking logarithms and applying Lemma 22.3 
twice, we arrive at the estimate 


y<pxu y<pxu P 


This completes the proof of (22.17) when [u,v] C [Y£, 00). For the remain- 
ing range, we simply note that )’y -,<yc 1/p = O(1). 


Next, we prove (22.15). Fix, for the moment, ¢ > 0. (We will even- 
tually let ¢ + 0*.) We then know that Y; < oo from the Euler product 
representation of L(s, x). Now, for any x > Y-, we have that 


3 Re(f-(p)) _ 3 Re(felP)) 4 o1) = tog|F,-(1+ —)| + O(1), 


Y<P<Ye : - al 


where the first equality follows by (22.17) and the second one from Lemma 
Letting « + oo, we deduce that 


yp REP) tog ir, +00) =- So 2 +00), 


Y<p<Ye Y<pcYe 


y<pu 


where the second equality follows from the definition of Yz and Mertens’ 
second estimate (Theorem [3.4(b)). We conclude that 


(22.19) » i Rel je) < O(1). 


Y<pc<Ye P 


Now fix [u,v] Cc [y,Y). If € is small enough, then [u,v] C [y,Y-). In 
addition, all summands in (22.19) are non-negative, because 1 + Re(z) > 0 
when |z| < 1. In conclusion, 


1+ Re(fz 
y - (p)) 


0< < O(1) 


U<pxu 


for all ¢ that are sufficiently small. Letting « > 0* completes the proof of 
(22.15) and hence of the theorem. 


We now combine the above result with the ideas used in the proof of 
Theorem Note that the result we obtain below can be combined with 
(22.5) to yield a result that is almost as strong as Theorem [12.3 


Theorem 22.6. Consider a Dirichlet character y mod q, and real numbers 
t and y > max{q({t| + 2), 10}. 
(a) (i) If either y is not real or |t| > 1/logy, then |L,(1 + it, x)| = 1. 

(ii) If x is real and non-principal and |t| < 1/logy, then 


[Ly (1 + it, x)| < max{Ly(1, x), |t| logy}. 


230 22. Sieving for zero-free regions 


(b) Assume that x is real and non-principal, and let x’ (modq’) be another 
real, non-principal character that is not induced by the same primitive 
character as x. If Ly(1,x) > Ly(1,x’) for some y > max{q, q’}, then 
Ly (A, x) =< 1. 


Proof. (a-i) Our assumptions on y, y and t imply that y > qy,¢ and y > q,24. 

Now, let Y be as in Theorem[22.5] Since we already know that |Ly(1,x)| <1 

from Theorem [22.1] it suffices to prove that |L,(1,x)| > 1 or, equivalently, 

that log Y « logy. Theorem 22.5] implies that D(x(n), u(n)n™;y,Y) < 1. 

Applying Minkowski’s inequality as in the proof of Theorem|[12.3] (or simply 

noticing that |z? — w?| < 2|z + w| when |z|,|w| < 1), we have 

(x7(n),n*5y,Y) < 2D(x(n), w(n)n“sy,Y) <1. 

On the other hand, Corollary 22.4] and relation (22.14) imply that 

(7(n), ns y,Y)* > log(log Y/ log y) — O(1). 

Comparing the above estimates, we find that log Y < logy, as needed. 
(a-ii) Let z = e!/l4, so that z > y > max{q(|t|-+2), 10} and |t| > 1/log z. 

Hence, |L,(1+7t, x)| = 1 by part (a-i) applied with z in place of y. Together 

with Theorem [22.5] this implies that 


Z<pKxr 
We combine the above estimate with Lemma to find that 
log Ly(1+ it, x) = Jim A (1+ 1/log x + it, x) 


(22.20) 


= a (1). 
y<pxzP 


We have |p—* — 1| = | f5( (log p)p“ du] < |t| be Hence, 


log Ly(1 + it, x) = s x(P) 


Y<pKz 
Let Y be as in Theorem 22.5] with t = 0. Then 
: 1 
logLy(1+it,x)= So XP) 4 oa) = So = + 0(1), 
y<p<min{Y,z} y<p<min{Y,z} 


where we first applied (22.16), followed by an application of (22.15) (both 
with t = 0). Since logY x (logy)/Ly,(1,x) >> logy, Mertens’s second 
estimate completes the proof of part (a-ii). 


(b) Let Y = max{y,y'/l4v01) and Y’ = max{y, y!/l4vG29I), as in 


the proof of Theorem 22.5] Since Ly(1,x) > Ly(1,x’), we must have that 
Y <Y’. Hence, Theoreu UTS that D(w, u;y,Y) « 1 fory € {x,x’}. 


Exercises 231 


Using Minkowski’s inequality, we infer that D(x, y’;y, Y) <1. On the other 
hand, Corollary [22.4] and relation (22.14) yield that 


(x.x/3y,Y)? > log(log Y/ log y) — O(1), 


where we used that yx’ is a non-principal mod [q, q’] < y?, which follows by 
our assumption that y and y’ are induced by different primitive characters. 
Hence, log Y < log y, which completes the proof of the theorem. 


Exercises 


Exercise 22.1. Let f and F be as in Lemma[22.3] Fix t € R and y > 2, and assume 
that the function o + F(o+it) is continuously differentiable for o > 1—1/ logy, as 


well as that Fo + it, f)| < (logy) uniformly for 7 € {0,1} and o > 1—1/logy. 
Let Y = yt/ min{1,|Fy(1+ee) | 


(a) Prove that 


D(F(n), (nn; u,v) =O(1) when [uo] C 
ucp<y Re(f(p)p~*)/p = O(1) when [u,v] € [Y, +00). 
(b) Prove that there is an absolute constant c > 0 such that 
logy/logY ifl1—c/logY <a <1+41/logY, 
|\Ly(o + it,x)| = 4 (o-l1)logy if1+1/logsY <a <1+4+1/logy, 
1 ifo >1+1/logy. 


Exercise 22.2* Assume there are constants ¢« > 0, L > 2 and qo such that 
(2—e)x 
9(q) log x 


Show that there is some c = c(qo,L,¢) > 0 such that, for all moduli g > 1, the 
function []\ (moaq) E(S; X) has no zeroes with o > 1—c/log(q(|t|+2)) (ie., there are 


m(x;q,a) < (c > q",q> 4%). 


no Landau-Siegel zeroes). [Hint: Given a real, non-principal character y (mod q), 
prove a lower bound for the sum }7y1 cp<,(1 + x(p))/p-] 


Exercise 22.3* Let y be a real, non-principal Dirichlet character mod gq. Theorem 
[22.6] proves that if Ly(1,y) >> 1, then L(s,x) does not have a Landau-Siegel zero. 
This exercise shows that the converse is also true. Moreover, it establishes a precise 
connection between the location of a potential Landau-Siegel zero and the size of 
£,(1, ahs 

Throughout, g: = max{q, 10}1°° 
oa <land V(y) = [[,<,(1— 1/p). 
(a) Let fo(x) = (a1-? —1)/(1—) = fY w~?dw. Show that there is a constant 

oy = O(log y) such that 

1 = U 
S- oe (Jal) + Yo,y)V(y) + Ole uo 3 


n 


,T>yrzUN,uU=loga/logy, 1—1/logy < 


232 22. Sieving for zero-free regions 


b) For A’ > A> y and x = y, show that 
y 


3 x(a) fo(a/a) _ [* S x(a) dw — 2? log(xA) 
a? — ‘ a’ we Al00/logy ° 

A<axA’ A<ac<min{A’,«/w} 

P(a)>y P (a)>y 


Conclude that 


ye OH9O) fF Lay - (25 rou) Eolas 0) OE. 


pre ne l-o 1 

P-(n)>y 
[Hint: Recall Lemma [22.2]] 

(c) Show that there is an absolute constant c > 0 such that if Ly(o,x) > 0 for 
some o € [1 —c/logy,1), then Ly(1,x) > (1—) logy. [Hint: Examine the 
sign of (1/(o — 1) — Yo,y)Ly(9,x)-] 

(d) Show that if L(o, x) £0 for o € [1—c/logm, 1], then L,(1, x) > 1. 

(e) If there is 6 € [1 — c/logq, 1] such that L(6,x) = 0, then show that 1— 6 x 
L,(1,x)/logq. [Hint: To prove the upper bound on 1 — 8, use part (c). To 
prove the lower bound, use the Fundamental Theorem of Calculus.] 


Part 5 


Bilinear methods 


Chapter 23 


Vinogradov’s method 


The parity barrier of sieve methods prevents us from getting tight bounds 
on ))p<2@p under the mere assumption of Axioms for the sequence 
A = (i, )4:: Ie 1934, 1, M. Vinogradov] developed a new method for 


n=1° 
estimating )>,,-,,@p when A satisfies certain additional hypotheses. 


pSu 
To simplify the exposition of Vinogradov’s idea, let us assume that |a,,| < 


1 for all n. We then have 


m= Dd, ant O(a). 


px nKx 


P-(n)>Va 


Applying a variant of Buchstab’s identity (19.1) to the right-hand side yields 
that 


(23.1) Soa= >> om- >> dS) ant O(¥2), 


par nXx 2 <psr NST 
P~(n)>2* P~(n)=p 


where € > 0 is at our disposal. If we assume that the sequence A satisfies 
a suitable version of Axioms the first sum on the right-hand side of 
(23.1) can be estimated accurately using the Fundamental Lemma of Sieve 
Theory (Theorem [18.11) for small enough values of ¢. Thus, it remains to 
handle the double sum over p and n. 


Writing n = pm, we find that 


B= Yam LL ow 


Le<pcun n<x x <pKau 
P~(n)=p mp<x, P~(m)>p 


1Not to be confused with A. I. Vinogradov from the Bombieri-Vinogradov theorem. 


234 


23. Vinogradov’s method 235 


The right-hand side closely resembles a bilinear sum 


KL 
(23.2) > S- AkeL EYE 


k=1 £=1 


for appropriate coefficients x, and yy. There is a small technicality: the 
variables p and m are weakly tangled via the relations pm < x and P~(m) > 
p. We can easily decouple them though: we (roughly) have 


(23.3) Be > B with Be SoS) Gee; 
ve <QI<gl/2 2-l<p<2J 
m<a/2I,P~ (m)>23 
so that B is a sum of O(log x) bilinear sums (B; is of the form with 
K=2/2, L=2, xy =1p-p)>05 and ye = 10 is primelee(2/-1,25))- 

Vinogradov’s groundbreaking idea is that, for certain special sequences 
A, we can obtain strong estimates for the bilinear sum no matter 
what the coefficients x, and ye are, as long as they are of controlled size 
(e.g. if |xz|, [ye] < 1 for all k, 2) and as long as both K and L are large, so 
that we have genuine bilinearity? We may thus forget the precise definition 
of x, and yg. If this alleged bilinear estimate (which we can think of as 
“Axiom 4” of sieve theory) is available in a large enough region of K and 
L so that both terms on the right-hand side of can be handled (the 
first one by Axioms [I}3]and the second one by Axiom 4), we can break the 
parity barrier and extract primes from the sequence (ap,)°21. 

We will explain Vinogradov’s method more rigorously in the subsequent 
sections. But first let us note that Axiom [3] of sieve methods can also be 
thought of as an estimate for a bilinear sum of the form (23.2), but with 
ye = 1 for all @. Indeed, if (an)°24 C [1, Z] and we assume Axiom[] then 


K OL K K mu K 
Dd, danete =) teAy =X YE + D awre, 
k=1 l=1 k=1 k=1 k=1 
where Aj; is defined by (18.2). If we assume that |x;,| < 1 and that Axiom] 
holds with level of distribution D > K, then we can obtain a strong estimate 
for Fe KvkTk- Conversely, if we can estimate this sum for any choice of 
Lp, we can also estimate it when zx; is the sign of rz, which brings us right 
back to Axiom B] 

In conclusion, we may think of Axiom [3] as a bilinear estimate with 
the coefficients ye being smooth functions of @. This point of view will be 
important in the next section. 


?If, for instance, K = 1, then the expression in (23.2) becomes a sum over a single variable. 
We want to avoid such degenerate situations. 


236 23. Vinogradov’s method 


Two types of functions 


Various technicalities in Vinogradov’s method are simplified if instead of 
the sum )/,<, 4p we work with >/,,<,@nA(n). Indeed, the combinatorial 
identity A = yu * log readily implies that 


(23.4) So anA(n) = S> agepi(k) log &. 
nKx kl<a 

We thus see right away that > mez nA(n) has some sort of bilinear structure. 
To bring the right-hand side of (23.4) into the form (23.2), we localize k into 
a dyadic interval (27-1, 2], so feat é < 2/2)-'. As we briefly mentioned 
before, the method of bilinear sums is efficient only when both & and @ are 
“long variables”, that is to say, when 2/ and x/2/ are both large (say when 
D <2! <a/D). On the other hand, when 2/ < D, we can take advantage 
of the fact that the long variable @ is weighted with the smooth function 
log. Hence, this part of the sum can be handled too, provided that we have 
at our disposal an appropriate version of Axiom [3] as per the discussion 
in the end of the previous section. It remains to handle the summands 
with 2/D < 2) < x. If we can rewrite this part of the sum as a linear 
combination of sums that fit into one of the two above categories (i.e., a 
combination of some bilinear sums, and of some other ones with at least one 
smooth variable), we will have completed the estimation of >, —,, dnA(n). 


This brings us to the heart of Vinogradov’s method: given x > 1, we 
seek an identity of the form 


(23.5) A(n) = S- (f7 * 9;)(n) + R(n) ior n= z 


l<j<J 


n<a & 


where the function R is a negligible “remainder term” in the sense that 
en<e [anR(n)| is small compared to >7,,<, |@n|, and for each j the summands 

f; * 9; fall into one of the following two categories: 

I) supp(f;) C [1,y,;] for some y; that is small compared to x and gj; € 
C™(Rs1). We then call f; * gj; a quasi-smooth or type I function and 
refer to the sum 

> dnl f j* 95 )( oer fi(k »S aneg;(&) 
NXa k<yj l<a/k 


as a quasi-smooth, quasi-linear or type I sum. 


II 


ae 


supp(f;) © [1,y;] and supp(g;) € [1, z;], where Dj < y;,z; < #/D; for 
some large D;. We then call f; * g; a type IT function and its average 


So an(fix ar) = SOS? anefj(k)gi(O 
n<x k<yj, l<2;, kl<ax 


a bilinear or type IT sum. 


Decomposing von Mangoldt’s function 237 


Decomposing von Mangoldt’s function 


Vaughan’s identity. One of the simplest and most useful ways to arrive 
at an identity of the form (23.5) was discovered by Vaughan. Given an 
arithmetic function f and a parameter V, we write 

(23.6) fev(n) =Incv: f(r) and fg (n) := Insv: f(n). 
With the above notation, the identity A = yz * log can be written as 

(23.7) A=pcy *log + psy * log. 


The first term on the right-hand side of (23.7) is of type I. But the second 
term is neither of type I nor of type II. To proceed, we replace psy by w<v 
using Mobius inversion: we have 


(23.8) Moy *1l=6 — pey *1, 


where we recall the notation 6(n) = 1n=1 from Chapter [3] As preparation 
for inserting (23.8) into (23.7), we write the latter formula as 


A= pcv * log + Usy *1* A. 


Because A has unrestricted support, we first split it as A = Acy + Asu, 
where U is some parameter, and then apply (23.8) only to the part of A 
supported on [1,U]. We conclude that 


A= py *log + usy*l*Asy + (6 — pv #1) * Agu. 
We have thus proven Vaughan’s identity: 


Lemma 23.1. For any U,V > 1, we have 


(23.9) A=pcy *log — (Acu *ucv)*1 + (Asu*1)*usv + Agu. 


The function A<y is supported on small integers and hence contributes 
a negligible amount to averages of A. The function cy * log is a quasi- 
smooth convolution: the first factor is a bounded function supported on 
integers < V. Similarly, the function (Acy * u<y) * 1 is also a quasi-smooth 
convolution, with the factor Acy * u“<y being supported on [1,UV] and 
satisfying the pointwise bound |Acy * u<v| < A* 1 = log. We denote the 
total contribution to A of these two type I functions by 


(23.10) AP = pey *log — (Acy * pcy)* 1. 
Finally, the function 
(23.11) A? := (Asy *1) * psy 


is of type II: its first factor is supported on integers > U and its second one 
on integers > V. 


A very useful feature of A? is that one of its factors is the Mobius function 
that is completely aperiodic (see Corollary and Exercise 23.4). As a 


238 23. Vinogradov’s method 


result, A’ typically contributes to the error term in the estimation of the sum 
en<e anA(n), so that the main term comes from A®. We thus think of A‘ as 
the “structured” part of A. It resembles a sieve-type weight and we need a 
suitable version of Axiom [3] to estimate its averages. On the other hand, we 
think of A’ as an “unstructured/random” error term, and we usually treat 
it using bilinear methods. 


Remark 23.2. By definition, we have 


M’(n) = S030 (Asu * D(z) HO). 
mv 


When n < 2, we have U <k =n/€< x/V. However, we often need better 
control of the support of the variables k and @. To achieve this goal, we cover 
the interval (U,x/V] by dyadic intervals (2/~', 27], where 2? € (U, 2z/V]. If 
k € (23-1, 23], we also have that £ = n/k < 2/2/~1. This leads us to the 
more accurate decomposition 


(23.12) Min)= So (fp*gj)(n) for n<a, 
U<25<2x/V 


where fj(k) = (Asu * 1)(K)losaceegs and gj(¢) = w(O)ly ceece/2i-1- 


Presieving A. In many occasions, it is advantageous to use a variant of 
Vaughan’s identity whose summands enjoy slightly different properties. A 
simple way of obtaining such a variant is by presieving A. Indeed, since 
primes do not have small prime factors, we write 


A(n) = A(n) + 1p-(m)sy + A(n) - p-(njcy- 


We expect A(n) - 1p-(n)<y to be small on average because it is supported 
on prime powers p™ with p < y. Next, we decompose the function A(n) - 
1p-(n)>y by first replacing A by ys * log. This yields the identity 


(23.13) A(n)1p-(nysy = D. >> uk) log é. 
P~thty>y 


The fact that log(1) = 0 means that the above sum is supported on integers 
> 1. Since we also know that P~(¢) > y, we must have f > y. We thus 
see that we automatically have a long @ variable weighted with the smooth 
function log times the indicator function of integers free of prime factors < y. 
Even though the latter is not a smooth function, it is quasi-smooth when y is 
small enough. The reason is that Theorem[19. TJallows us to approximate the 
function n > 1p-(n)sy = 1~m,P(y))=1 by convolutions A* « 1, where A* take 
values in [—1,1] and have small support. Hence, for all practical purposes, 
we may think of the function ¢ + 1p-,g)s, log ¢ as a quasi-smooth function. 


Decomposing von Mangoldt’s function 239 


Motivated by the above discussion, we split the right-hand side of (23.13) 
according to the size of k, which leads us to the following decomposition: 


(23.14) A= At + Nic + Resieve, 


sieve 


where 

(23.15) Meet) = S2 >> n(h) loge, 
kl=n,k<D 
P~(k)>y 


(23.16) Mieve(n) = >>) s(k) loge 
kl=n,k>D,l>y 
P~(k)>y 


and Resieve(n) = 1p-(n)<yA(n). Note that Al is essentially of type I, A 


is of type II and Regieve is of negligible size on average, since 


(23,17) > Reon) = Se > logp < S "log x < ylog a. 


nN<x PLY, p™ <x PSY 


b 


sieve 


A choice of y and D that works for many applications is 
(23.18) y =exp{(logx)"} and D =exp{(logx)”}, 
where 0 < 6; < 02 < 1 can be chosen freely. 


The main advantage of compared to Vaughan’s identity is that 
the functions Al oe and Kee. are presieved with all primes < y. This rather 
technical feature of plays a key role in the proof of Linnik’s theorem 
in Chapter We will also see in Exercise how it leads to a better 


version of the Bombieri- Vinogradov theorem. 


A secondary advantage of (23.14) versus Vaughan’s identity is that its 
“main term” AN, consists of a single type I function. This fact makes 


various calculations easier and will come into play in Chapter 


On the other hand, Vaughan’s identity offers much more freedom in the 
choice of the parameters U and V. Therefore, we have more control over the 
support of the functions appearing in the type I and type II sums, which is 
very important in certain applications. In contrast, the parameters y and D 
in must be chosen carefully so that we have enough room to apply 
the Fundamental Lemma of Sieve Theory. In particular, y must be «°), 


Remark 23.3. It is possible to create a new combinatorial decomposition 
of A that combines the best attributes of Vaughan’s identity and of (23.14). 
This is done by presieving Vaughan’s identity, that is to say, by multiplying 
all summands of with the function n > 1p~(n)sy- 


There are a lot more combinatorial decompositions of von Mangoldt’s 
function than the ones we discussed above. A formula of particular im- 
portance is Heath-Brown’s identity, given in Exercise below. It is not 


240 23. Vinogradov’s method 


exactly of the form (23.5). Hence working with it is a bit more complicated, 
the task being understanding how to rearrange its terms and bring it to the 
form (23.5). However, Heath-Brown’s identity has the important feature 
that all of the long functions appearing in it are smooth. 


A further analysis of the subject of combinatorial decompositions of A 
can be found in Chapter 13] or Chapter 17]. Finally, a more sieve- 
theoretic approach to Vinogradov’s method that is more in line with the 
discussion in the introduction of this chapter is presented in Harman’s book 
on prime-detecting sieves [96]. 


The additive Fourier transform of the primes 


To exemplify Vinogradov’s method, we employ it to study a concrete and 
rather important example: the exponential sum 


s- e(ap). 

PSE 
This sum is intimately related with the additive properties of primes and we 
will use it in the next chapter to study ternary arithmetic progressions in 
the primes. To get an idea of its size, we begin by studying it assuming the 
Generalized Riemann Hypothesis. This will serve as a guide for what kind 
of bounds to look for when we estimate it later via Vinogradov’s method. 

First, let us consider the special case when a is a rational number, say 

a =a/q with (a,q) =1. Then 


Yo elap/a)= SY) e(ab/ayr(a:9,b)+ SY) e(ap/q) 


p<u be (Z/qZ)* P<x, p\q 
= @) S- e(ab/q) + O(/ xq log(qz)) 
PD be (Z/92)* 


by Exercise and partial summation. Making the change of variables 
n = ab(modq), we see that the sum over b is the Gauss sum of the principal 
character mod q, which equals ju(q) (see Exercises and (10.5). Therefore 


ND) eG xq log( qu 
S > e(pa/q) = alg) ule) + OW zaloe(ae)). 


To estimate )/,,<, (pa) for irrational a, we find a good rational approx- 
imation to it using the following classical result. 


px 


Lemma 23.4 (Dirichlet’s approximation theorem). Let a € R and Q > 1. 
There is a reduced fraction a/q with q < Q and 

1 

qQ” 


a 
a- {| < 
q 


Type I exponential sums 241 


Proof. Consider the |Q]| +1 numbers ag with 0 < q < Q. We reduce them 
mod 1 to place them in the interval [0, 1). By the pigeonhole principle, there 
must exist 0 < qi < q2 < Q such that |laq2—aqi|| < 1/(|Q| +1) < 1/Q. We 
then take q’ = q2—q: and a’ to be the unique integer in [aq’ — 1/2, aq’ +1/2), 
so that 1 < qd’ < Q and |aq’—a’| = |laq'|| < 1/Q. Letting a/q be the fraction 
a’/q' in reduced form completes the proof. 


Fix Q and a/q as in Lemma[23.4] If we write a = 6 +a/q, then 


> e(ap) = [ e(By)d } | e(ap/q) 


2 


(23.20) oo ( ) x ie 
aati g . | + Bla Xq 10, x 


by partial summation. Since || < 1/(qQ), taking Q = \/z(log x)? yields 


(23.21) S| e(pa) = u(q) [ 99) ay + o( oa + Viqlog:). 


y(q) Jo logy (log x)? 


pKux 


In particular, we see that if a is close to a rational number of denominator 
q € [(log x)’, /x/(log x)?], then there is significant cancellation among the 
numbers e(a@p) with p < x, which makes )7,,<,e(ap) smaller than m(x). 


The above calculation is a manifestation of an important principle stem- 
ming from the Hardy-Littlewood circle method that we will study in detail 
in Chapter 24} the Fourier transform 


(23.22) > Cne(na) 
Nx 

of various interesting arithmetic sequences (€p)n<z is big when a lies close to 
a rational number of small denominator, and it is small otherwise. The rough 
heuristic to explain this dichotomy is that when aq is far from any fraction of 
small denominator, the sequence (e(n@))n<z lacks any meaningful arithmetic 
structure, so that it cannot correlate with any “reasonably regular” sequence 
(Gh ies 

A central problem in analytic number theory is to establish strong esti- 
mates for the exponential sum }7,,<, Cne(na): an asymptotic formula when 
a is close to a fraction of small denominator, and a non-trivial upper bound 
otherwise. In particular, we would like to do so when c, is the indicator func- 
tion of the primes without appealing to the unproven Generalized Riemann 
Hypothesis. 


Type I exponential sums 


In view of the decomposition of A into type I and type II functions, the esti- 
mation of >),<, e(ap) boils down to the estimation of >) ,<,(f*g)(n)e(an), 


pKu 


242 23. Vinogradov’s method 


when f * g is a function of type I or II. We begin by studying the first 
category of functions. 


Let us begin by handling the simplest non-trivial type I function: the 
constant function 1. Arguing as in (10.12), we have 


ee) 1 
1—e(a) | © 2llal]’ 


(23.23) | 2 e(an)| = 


Nn<Kx 


e(a) - 


where we recall that ||a|] denotes the distance of a from the nearest integer. 
We thus immediately see that, as long as ||a|| = o(1/z), the sum } 7, <, e(an) 
is small compared to the trivial bound 


(23.24) p> e(an) < Slice. 


Nga NKx 


Using partial summation, we may easily pass from (23.23) and (23.24) 
to an estimate for the Fourier transform of the function log’, where v is any 
fixed positive real number. Indeed, we have 


S "(log n)’e(na) = a (log t)’d SS e(na) 


NKxr nt 


(23.25) < (log x)” - min{z, ||a||~1} 


uniformly for x > 1 and v > 0. Similar estimates are true if we replace log” 
by a more general smooth function but we will not need them. 

The above observations and the simplest version of Dirichlet’s hyperbola 
method allow us to establish non-trivial estimates for general exponential 
sums of type I when a is close to a fraction a/q of large denominator (say, 
with q > (logx)4 for some large A). The notation || f||,. in the statement of 
Theorem [23.5] below stands for the supremum norm of f. Finally, its proof 
features an important concept in the study of exponential sums: we say that 
a set of real numbers {a1,...,a,} is d-spaced mod 1 if 


(23.26) la; — aj|| > 5 whenever iF j. 


Theorem 23.5. Let f : N—C be supported on [1,y], v>0,2>2,aeER 
and a/q be a reduced fraction with |ja—a/q| <1/q?. Then 


(23.27) pati « log’)(n)e(na) < (y - : + a) (log 2)°*|| fll oo- 


NKx 


Proof. If g = 1,q > x or y > 2, we simply note that |(f * log”)(n)| < 
Il flloT(n)(logn)” and use Theorem [3.3] Assume now that 2 < q < x and 


Type II exponential sums 243 


y < «a. Opening the convolution and applying (28.25) yields 


S > (f * log”) (n =~ f(k) S> (log é)"e(£: ka) 


NnXu k<y l<a/k 


(23.28) « (logx)”|| floc }> min {a/k, 1/||kax|}. 


k<y 
We cover the last sum by subsums of length ¢ := |q/2| defined by 


Smi= >) min {x/k,1/||kell}. 


mg<k<(m+1)q 


Since (a,q) = 1, the numbers ka/q with mg < k < (m+ 1)@ are all distinct 
mod 1. Hence, ||kia/q — k2a/q|| > 1/q whenever mg < ki < ko < (m+ 1)@. 
On the other hand, if we write a = a/q+ 8, then |ki8 — k28| < qB| < 
(q/2)/q° = 1/(2q). As a consequence, we find that the numbers ka with 
mg <k < (m+1)@ are (2¢)~!-spaced mod 1. We index them as ay,..., ag 


in a way that |la;|| < --- < |lag|]. For each integer j € [1,q], the interval 
(-4E, i) can contain at most j—1 of the reductions mod 1 of the numbers 
a1,..., ag. Hence, we must have that ||a;|| > (7 — 1)/(4q) for 7 =1,..., ¢. 


When m > 1, the above discussion and the fact that 7/k < «/(m@q) 

whenever k > mq yield the inequality 
x ba 
2<j<q/2 

However, when m = 0, we cannot use the above argument as it currently 
stands because we do not have a good bound for the summand of So corre- 
sponding to the integer k with ka = a. Note though that if 1 << k < q/2, 
then || < (q/2)/q2 = 1/(2q) and |\ka/q|| > 1/q. Therefore, ||kal] > 1/(2a) 
for all k € ZN [1,¢/2]. In particular, ||a1|| > 1/(2q) when m = 0, and thus 


4q 
(23.30) So<2q+ SO jz < og 4. 
2<j<q/2 


Combining (23.29) with (23.30), and noticing that there are < y/¢ < y/q 
integers m € [1,y/q] allows us to estimate the expression in (23.28) and 
complete the proof of the theorem. 


Type II exponential sums 


Let us now consider the exponential sum )7,,<,(f * g)(n)e(an) for a type 
II function f * g. For concreteness, we assume momentarily that supp(f) C 


244 23. Vinogradov’s method 


[1, y] and supp(g) C [1, z] with y = x? and z = 2!~® for some 6 € (0,1). We 
then find that 


(23.31) So(fa nl =S°S- flb)gMelake). 


nae ky, Xz 
ke<a 


The advantage of this formula is that it transforms the Fourier transform of 
f *g into a double sum that we can interpret as an average of many sums. 
For instance, we may arrange the summation as 


(23.32) Sof * al =Sofh) So g@elake). 
n<Xa k<y et 


In practice, we do not know much about the function g, so that for a given 
k we cannot hope to do much better than the trivial upper bound 


3 alte(ak0)| < SO! 


é<min{z,a/k} l<min{z,x/k} 


(23.33) 


(Consider for instance the case when g(¢) = e(—a@) and k = 1.) However, it 
turns out that (23.33) can be improved for most k, something that we can 
take advantage of since we are averaging over many values of k. 


We begin by noticing that the sum in the left-hand side of (23.33) can 
be interpreted as the Hermitian inner product over C of the vectors 


F=(9(O)é-1 and te = (lnece - e(—Kéo) #1, 
where d = |z|. The key observation is that if a ~ a/q with large q, then 
the vectors U, are approximately orthogonal to each other, so that the fixed 
vector g cannot correlate strongly with many of them. Consequently, we 
expect that the trivial bound (23.33) can be improved significantly for most 
values of k. 


To see the claim that the vectors % are mutually quasi-orthogonal, note 
that relation (23.23) implies the estimate 


— 1 

(23.34) (p,5 Fey) = > e(—kyla)e(—k2la) < ——_—_.. 

é<min{z,x/k1,x (e 7 k1 Jal| 

S$ , 1 ,x/k2} 
Generalizing the argument used to prove Theorem [23.5 we will show that 
if a is far from fractions of small denominator, the quantity ||(k2 — k1)a|| is 
away from 0 for most pairs (k1,k2) with ki 4 ka, so that (G%,, %,,) is small. 

The above ideas will be vastly generalized in Chapter[25] where we study 

bounds for general bilinear sums 4 3 Amn&mYn- We will prove there 
that there is some A that depends at most on the coefficients am, such that 


f oN M 1/2, N 1/2 
n=1 


m=l1n=1 m=1 


Type II exponential sums 245 


For now, we use this circle of ideas to derive a strong bound for the Fourier 
transform of type I functions. The notation || f||2 in the statement of The- 
orem stands for the ¢?-norm of f, that is to say, || {|} = Cysilf(n)/’. 


Theorem 23.6. Let f,g : N > C be two arithmetic functions such that 
supp(f) © [1,y] and supp(g) © [1,2]. In addition, consider a € R and a 
reduced fraction a/q such that |ja—a/q| <1/q?. For all x > 1, we have 


Tlf alnjelan) < (y+ y+ 2+ 4)" Vio) - ileal 


NKx 


In particular, if yz < 2x and |f\,|g| <1, so that ||f|l2 < Vy and ||g| 
then 


nN 


ie) 
x 
= 


Lil + g(nje(na) « (+ T+ T+ vi) Vios(2a). 


Nx 


Proof. Let S be the sum we want to bound, which we arrange in the “dual 


form to (23.32) 
S=Sig(f) So flke(kea). 


l<z k<y,kl<z 


We use the Cauchy-Schwarz inequality to remove the unknown function g: 


IS? <|lgl32] S2 F@e(Kea) 


l<z'k<y, kl<a 


As a result, the variable @ is now weighted with the smooth function 1. 
Opening the square via the identity |z|? = zz yields that 


ISI? < fIgl3 S22 Ye Fk) Fle)e((her — k)Ea) 


l<z, ki, ka<y 


kif, kol<a 
(23.35) =[IIB OTF) Fle) = el (kr ~ ha). 
ki, ko<y é<min{z,a/k1,0/k2} 


We bound the innermost sum of (23.35) using (23.34) to find that 
1 
IS? < |[gll3 Gc ohne ——-—\. 
; 22 { I|(k2 — ky )all } 


To remove one of the unknown factors f(k;), we use the inequality |zw| < 
(|z|? + |w|?)/2. This implies that 


ISP <IllB Ded min 2 EE Taq 


JE{1,2} ki, ka<y 


3This terminology will be explained in Chapter 


246 23. Vinogradov’s method 


The theorem will then follow if we can prove the following estimate: 
23.36 min n {er le< (a t+yt2+ a log(2q), 
a [eral 


uniformly for all 7 € R. This will be demonstrated by adapting the argument 
of the proof of Theorem [23.5 


If ¢q = 1, (23.36) follows by majorizing all summands by z. Let us 
consider now the more interesting case when q > 2. We let ¢ = |q/2| and 
break the interval [1, y] into subintervals of length q to find that 


ly/a| (m+1)¢ 


(23.37) aay = Tarai} < S- 


m=0 k=mg+1 


1 
min a ——— }. 
[ka + | 


Fix m € Zso. Arguing as in the proof of Theorem [23.5| we find that the 
numbers ka +7 are (2q)~!-spaced mod 1 when mg < k < (m+1)q. Hence, 
a straightforward adaptation of the proof of (23.29) implies that 


1 4 
>» min {2, pe ~ Tg z4qlogg. 
sg F ka + nl :; =1 
G<k<(m+1)g 2<j<q/2 


Inserting this bound into (23.37) completes the proof of (23.36), and hence 
of the theorem. 


Remark 23.7. Remarkably, the estimate for )7,<,(f*g)e(an) supplied by 
Theorem [23.6] is essentially sharp. For simplicity, we consider only the case 
when y > z, since the other one is symmetric. 


Indeed, let x > yz > x/2 and choose f(k) to be the complex conjugate 
of Sie<, g(l)e(ake). We then have 


SUF # a)elan) = |X o(He(ane)|” = |1F3 
NXax k<y lz 


If we now let {g(¢)}e<, be a sequence of independent random variables with 
P(g(2) = 1) = P(g(€) = —-1) = 1/2, we find that 


2] 5] nao e(eta)| | = Sixes. 


k<y lz k<y l<z 


In particular, there must exist a choice of g(¢) such that >7,,<,.(f*g)e(an) = 
Il f||2 => x. Since ||g||? = |z], we infer that 


DoF * ge(an) = ||FI12 > Vzlfll2< V9 -lIfllzllglle- 


By swapping the roles of f and g, we can also find choices of them such that 


| den<elf * g)e(an)| > Vz: |Ifllellglle- 


The additive Fourier transform of the primes: Encore 247 


Finally, let us consider the case when a = a/q, f(k) = e(—ak/q) for 
k <y, and g(€) = 11 (modq) for € < z. We then have 


So(f*gyelan)= SOS 1xy-(2/¢+1) = Vuz/a+y- I fllellglle- 


N<L k<y, lz 
=1 (mod q) 
To conclude, a general estimate for > 7, <,(f*g)e(an) can never be better 


than max{y, z, yz/q}"/?||f\l2||gll2, and TheoremP3.6]comes remarkably close 
to this bound. 


The additive Fourier transform of the primes: Encore 


We shall now apply the methods we have developed to establish Vino- 
gradov’s famous estimate. 


Theorem 23.8. Let a € R and consider a reduced fraction a/q such that 
|a—a/q| <1/q?. For all x > 2, we have 


S> A(n) 0) «(T+ ell + Vag) (loge)? 


NKx 


Proof. We may assume that q < x; otherwise, the theorem follows by 
bounding all summands by log z. 

Let us decompose A using Vaughan’s identity. First, we deal with A®. 
We apply Theorem [23.5] twice, once to the convolution p<y * log (so v = 1 
and f = wy here, with y = V and ||f||.. = 1) and once to (cy * Acy) * 
(sov = 0 and f = wey * Acy here, with y = UV and |f| < 1* A = log, 
whence || f|o. < log(UV)). We thus conclude that 


(23.38) S- Ai(n)e(na) < (uv ‘ ; i a) log?(2UV). 


NKxr 


Next, we deal with A’. We rewrite this function using (23.12) and apply 
Theorem to each summand f; «9; of that identity. Since g < z, || f;|]2 < 
2) log? a and ||g;||3 < 2/27", we find that 


oy 0 
> A? (n)e(na) « ie Ge + V Qa + eR + Via) (log x)°/?. 
nX<x U<2i<2xr/V 


We note that V2i2 < a/VV and «/2)/? < x/VU. Applying these bounds 
to each of the O(log x) choices of j yields the estimate 


(23.39) S>A’(n)e(na) + faq ) (log x)°/2, 


NnKxr 


a rae a a 


248 23. Vinogradov’s method 


Since we also have that |}¢,<, A<u(n)e(na)| < Vin<cy A(n) K U andq < 
,/xq by our assumption that gq < x, Vaughan’s identity in combination with 


(23.38) and (23.39) implies that 
S > A(n) e(na) )< (UV +2/Vq+2/VU +2/VV + 29) (logx)?/”. 


n<x 


Taking U = V = x?/° to optimize the above bound completes the proof. 


Theorem [23.8] confirms the prediction we made using the Generalized 
Riemann Hypothesis that the exponential sum )7,,<, A(n)e(an) can only be 
large when a is close to a rational number with small denominator. Indeed, 
if |a — a/q| < 1/q? with (logx)4 < q < x/(logx)4, we find that 


(23.40) SAG) e(an) <4 x/(log x)(4-5)/?, 


Nx 


We will demonstrate the utility of this key estimate in the next chapter. 


Conclusion 


Vinogradov’s method allows us to deal with very general sums of the form 


(23.41) S > an A(n) 
Nx 

where (a;,)°2, is some interesting sequence. To estimate (23.41), we first use 
various combinatorial ideas such as convolution etn a ae it- 
erations to obtain an appropriate decomposition of A of the form (23.5). We 
then handle quasi-smooth sums, namely sums of the form )7,,<, bie 7 * ae ) 
with f “small” and g smooth, using a mix of tools such as the summation 
formulas of Poisson and of Euler-Maclaurin, L-functions, sieves and esti- 
mates for exponential sums (e.g. the Pélya- Vinogradov inequality and other 
more advanced results beyond the scope of this book). Finally, we estimate 
bilinear sums, namely sums of the form )/,,<, @n(f*g)(n) with f and g both 
supported on large integers, by employing methods arising from the theory 
of bilinear forms that we will fully develop in Chapter 25] (with the Cauchy- 
Schwarz inequality playing a central role), coupled with various exponential 
sum estimates. We thus see that this approach to the distribution of primes 
utilizes the full toolset we have at our disposal. 


Exercises 


Exercise 23.1. Consider a € R and a reduced fraction a/q such that |a — a/q| < 
1/q?. Prove that 


So r(nje(an) K (Ve+q+x/q)logz (a > 2). 


nka 


Exercises 249 


Exercise 23.2. Let v > 0, let f be an arithmetic function supported in [1, y], and 
x be a non-principal Dirichlet character mod q. For x > 2, prove that 


S°(f * log”)(n)x(n) < V/a(log q) (log x)” S © | f (k) 

n<x k<y 
Exercise 23.3. Let r,s > 1 be such that 1/r +1/s = 1. Assuming the set-up of 
Theorem [23.5| prove that 


SUF #loE")(n)e(na) <, (yl!"gl/* + 4+ 2/4) (loge) "Ife 
nXkx 
where || {lls = (Ore |F(A)S)”*- 
Exercise 23.4% 
(a) For any U,V > 1, prove that 
b= —pU<u *U<v 1 + Mou * sv *l + usu + ev. 


(b) Let a € R, and let a/q be a reduced fraction with |a — a/q| < 1/q?. For every 
fixed ¢ > 0, show that 


S- u(n)e(na) <e (@/Vq + 2G) (log x)? + gers (x > 3). 
n<Kx 
[Hint: Select U = V = min{x?/>,q,2/q} in part (a).] 
(c) (Davenport) Fix A > 1. Prove that 
se u(n)e(na) <4 x/(loga)4 (4 >2, aER). 
nxn 


Exercise 23.5 (Heath-Brown’s identity). Let k ¢ N, > 1 and V > a'/*. For 
n <a, show that 


k 

=S- va a (log+ 1+: & Le cy *-°° * U<y)(n). 
: e———+._ ,-$-_ —_ 
g=1 j- ary times j times 


[Hint: Let f = wey *land g = pyy*l. On the one hand, we have Axg *---* g =0 
lh 


k times 


on Ng,. On the other hand, g = 6 — f with 6(n) = 1n=1.] 


Chapter 24 


Ternary arithmetic 
progressions 


Vinogradov’s method allows us to advance significantly our understand- 
ing of additive patterns among the primes. We exemplify this principle here 
by proving the existence of infinitely many ternary arithmetic progressions 
a,a+d,a-+2d all of whose elements are prime numbers. The same ideas can 
also be used to study the ternary Goldbach conjecture (stating that every 
odd integer > 7 is the sum of three primes), as well as other similar “ternary 
additive problems” (see Exercise 24.1] as well as Chapter 26 of Davenport’s 
book [8I]). On the contrary, binary additive problems, such as the twin 
prime conjecture or the binary Goldbach conjecture (every even integer > 4 
is the sum of two primes), are generally out of the reach of the currently 
available methods. We give a brief explanation of the added difficulties when 
dealing with binary problems in the last section of this chapter. 


We now state the main result of this chapter. 


Theorem 24.1. Fiz A>0. For x > 2, we have 


LD AMraaloa)Alos) = 2? T] (1 Gaze) + Oa Toga: 


11,N2,NZ3LL p23 
ng—N=Nn3—N2 
Remark 24.2. Green and Tao [77] proved that, for any given k, there 
are infinitely many k-step arithmetic progressions a, a + d, a+ 2d, ..., 
a+ (k —1)d all of whose elements are prime numbers. Their proof uses 
techniques related to the celebrated theorem of Szemerédi that lie beyond 
the scope of this book. This theorem states that if a set of integers A has 
positive lower density, in the sense that #AN [1,2] >> x for infinitely many 


250 


The Hardy-Littlewood circle method 251 


x, then A contains arbitrarily long arithmetic progressions. Even though 
the primes do not have positive lower density, Green and Tao established a 
suitable transference principle that allowed them to pass from Szemerédi’s 
theorem to the case when A is the set of primes. An important step in doing 
so is the construction of a “sieve majorant” for the indicator function of the 
primes. 


The Hardy-Littlewood circle method 


In a series of papers, Hardy and Littlewood introduced a general technique 
that gave them access to a wide array of additive problems. They called their 
approach the circle method for reasons that will become apparent shortly. 
Their ideas were further developed by I. M. Vinogradov and led to Theo- 
rem We describe them in the context of counting ternary arithmetic 
progressions. 


The starting point of the circle method is the orthogonality relation 


1 
(24.1) Iwo = [ e(an)da, 


valid for any integer n. Using it with n = n, + ng — 2n2 allows us to 

re-express the indicator function of the event that the integers n1,n2,n3 
od tear : ; 1 

are in arithmetic progression: 1p.—ny=n3—n1 = Jo e(ani)e(ang)e(—2ang)da. 

Multiplying both sides by A(n1)A(n2)A(ng) and summing over n1,n2,n3 < x 

yields the formula 


1 
(24.2) SSSDT Atm)A(n2) Ans) = : Se Sle bade. 


n2—N1 =Nn3—N2Q 


where S(x;@) is the additive Fourier transform of A, that is to say, 


Siew) = S A(n)e(an). 


NKxr 


The name “circle method” comes from interpreting the expression in (24.2) 
as an integral over R/Z (which is a circle from a geometric point of view). 


Formula is what we would call a “gambit” in chess. The gain it 
offers is that it transforms the unknown expression on its left side into a new 
expression that we can hope to estimate using our knowledge for the sum 
S(x;a@) from the previous chapter. However, to achieve this transformation, 
we had to make a sacrifice: the trivial bound for the expression on the right- 
hand side of is eee A(n))3da =< «3. This means that we must 
somehow recover the loss of a factor of 1/a. There are two key ideas that 


252 24. Ternary arithmetic progressions 


will allow us to compensate for this loss: 


e There is a set 9M of a € [0,1] (called the set of major arcs) that 
“dominate” the integral in (24.2). This set has Lebesgue measure 
(log x)° /z, and the sum S$(2;a) has size x/(log x)?“ on it. It 
thus contributes ~ x? -1/xz = x? to the integral of (24.2), which is 
the size of the expected main term. 

e The size of S(x;a) for a “generic” a € [0,1] is a!/?+°, This 
follows from Parseval’s identity 


(24.3) [ |S(x; a)| "da = S> A(n) 2 x glog x. 
Nx 
It turns out that the major arcs consist of those numbers that are close 
to rationals of small denominator. More precisely, we may take 


(24.4) Mt = UU [a/q —L/x,a/q+L/a] 
1<4¢<L, 0<a<q 
(a,q)=1 
with L = (logx)?4+7. We also define the set of minor arcs m := [0,1] \ M0. 
The motivation for this terminology is the fact that S(a;2a) < «/(log2)4+! 
when a € m. Indeed, for any such a, Lemma [23.4] implies the existence of 
a reduced fraction a/q with gq < #/L and |a — a/q| < L/(qx) < 1/q?. Since 
a ¢ MM, we must have gq > L. Hence, the claimed bound on S$(x; 2a) follows 
by (23.40). Together with (24.3), this implies that 
2 


1 
ate = ; 2. F ae 
i |S(a; a)*S (2; -2a)|da < / |S(x; a)| ia (log a)4" 


Thus most of the contribution to the right side of (24.2) comes from a € SM. 


It remains to estimate S(x;a@) when a € Mt. Consider first the case 
when a = a/q. Then, we may argue as in (23.19) to prove that 


Co H@) re -eV bEe 
(24.5) d_ A(ne(an/a) = Fry t+ Oal **) 


for some c > 0; indeed, here g < (log x) , so the Siegel-Walfisz theorem 
is applicable and we do not need to appeal to the unproven Generalized 
Riemann Hypothesis. Finally, if a = 8+a/q with |B| < £/x, we may adapt 
the proof of (23.21) to pass from (24.5) to an estimate for S(x;q). 

Putting together the above estimates leads to a proof of Theorem 
The details we have omitted are presented in Chapter 26 of Davenport’s 
book (the treatment there concerns the ternary Goldbach conjecture, 
but it can be easily adapted to our setting). Here, we will give a different 


N<x 


2A+7 


Relation admits a short self-contained proof: we use the identity |z|? = zz to write 
|S(a; @)|? as a double sum over ni, ng < x, and then note that £ e(a(ny — n2))da = 1n,=no- 


Making the entire circle a minor arc 253 


way of calculating the main term of Theorem Before proceeding, let us 
pause for a moment to observe the amazing complementarity of the available 
estimates: Vinogradov’s method can handle $(2;a) when |a — a/q| < 1/q@ 
with q > (log ayer, and stops working well for smaller g. However, the 
remaining range of q is precisely what is covered by the best-known version 
of the Prime Number Theorem for arithmetic progressions. This allows us to 
handle the full range of integration and prove Theorem[24.1] unconditionally. 


Making the entire circle a minor arc 


Given arithmetic functions f,, fo, fg, we define the weighted count of ternary 
arithmetic progressions 


T (fi, fa, fase) = S52 S2 fil) fo(n2) fa(ns). 


1,N2,N3RKx 
ny tn3=2n2 


Our goal is to estimate 7(A, A, A; x). We will extract the main term to this 
quantity by decomposing A. 

Rather than employing Vaughan’s identity, it is more convenient to use 
(23.14). This leads us to the formula 


(24.6) Tikka SY, Fee) 
Fis f2f3€{ Mi eyerE} 
with E = A’ 


reve t sieve. If y and D are chosen as in (23.18), we will show 
that any summand involving at least one term f; = E is negligible. The key 
to proving this is the aperiodicity of the function AY. that manifests itself 
in Theorem below. A close examination of its proof reveals another 
instance of the complementarity of Vinogradov’s method and of the Siegel- 
Walfisz theorem to which we alluded above (though it is subtler now, with 


the Siegel-Walfisz theorem hiding within the proof of Corollary [13.4). 


Theorem 24.3. Consider x,y, D > 2 satisfying (23.18) with 0 < 0, < 4) < 
1, and fiz A> 1. If A’... is defined by (23.16), then for all a € R we have 


sieve 


b 
Neve (n) e(an) KO, ,02,A 


Nex 


os 
(log x)A" 


Proof. All implied constants might depend on 61,62 and A. Applying 
Lemma 23.4] with Q = «/(logx)?4+5, we may find a reduced fraction a/q 
such that 1 < q < Q and |a — a/q| < 1/(qQ). We use a different argument 
according to whether q¢ < (log x)?4*5 or not. 

First, we study the case when (logx)?4+° < q < x/(logx)*4+>. We 


begin by writing Ps = f*g with f(k) = les D, P-(k)>y MR) and g(l) = 
lpsy, P-(¢)>y log é. We then argue as in the proof of (23.39): we localize 


254 24. Ternary arithmetic progressions 


dyadically the factors of the convolution f « g as in (23.12) and then apply 
Theorem to each summand headin This yields the estimate 


Yo Micvelme(na) < Gras a + a + VEG) (log.e)°?, 


NnKx 


The right-hand is < x/(log x)“ by our choice of q, y and D, thus completing 
the proof of the theorem in this case. 

Finally, we consider the case when q < (log x)?4+5. We begin by study- 
ing the case a = a/q. For each w € [,/x, x], we have 


S- Aeevalt Je(na/q) = > e(ba/q) S- Nieve(t ) 


n<w bEZ/qZ n<w 


n=b (mod q) 
(24.7) = Ss" e(ba/q) > log ¢ Ss" p(k). 
bEZ/qZ y<l<w/D D<k<w/é, P~(k)>y 
P-(£)>y k£=b (mod q) 


Fix @ and b momentarily, and let d = (¢,q). The congruence ké = b (mod q) 
is equivalent to having d|b and k = ¢'b/d(modq/d), where ¢’ denotes the 
inverse of £/d mod q/d. Since w/é > D = exp{ (log x)™} = exp{ (log y)2/%} 
and q < (logx)?4+> = (log D)@4+5)/%, Corollary applied with m = 
I],<yP implies that the innermost sum of (24.7) is < (w/£)/(log gyre, 
Thus, 


w/e w 
Ss" Mieve(N )e(na/q) <K s. log ¢- (lo g x) SA+12 < (log g pate 


n<w bEZ/qZ, l<w 


uniformly for w € [\/z,2] and q < (logx)?4+°. To pass to an estimate for 
es Ae. o(nje(na), we write a = a/q + 8, so that |8| < (logx)?4+5/x. 
Using partial summation similarly to (23.20) implies that 


> Nreve(n)e(nar) = [ie (Sw) )d S> Meee lt? )e(na/q) < «<——{ 
Jeenke crest (log x) 


Since we also have the trivial bound )/,,< /¢ M. ve(ne(na) < V/#log* x by 
noticing that |A 


fn log, the theorem follows. 


a us now prove our claim that any summand on the right-hand side of 
with f; = E for some j is negligible. For eee assume that 
rs = a the other cases follow similarly. Arguing as in , we have 


Ch aoe [ Si aves lade) Slacalae, 


where Sr(x;@) = Do nce f(n)e(an) denotes the additive Fourier transform 
of the arithmetic function f. Since we have assumed that fg = E, Theo- 
rem 24.3]implies that 9, (2; a) = O4(x/(logx)4+?) for all a € R (ie., there 


Making the entire circle a minor arc 255 


are no “major arcs” anymore). Consequently, 


T (fi, fo, fa3 2) oof |S‘, ( x; 1a)S2,( 2; —2a)|da 


x 1/2 


_ 2 : 2 
< aaa [ |S, (x; @)| aa Sp, (x; —2a)| da) 


from the the Cauchy-Schwarz inequality. Parseval’s identity implies that 
{. |Sp(x; ka) /?da = D,<,|f(n)/? for any k € Z\ {0} (see (2£3)). When 
fe 6 eee 2 we have | f| < 2log and thus }>,,<, |f(n)|? « «(log x)”. To 
conclude, we have proved that 

T (fi, fo, f3;0) <a 2? /(log x)4 when there is fj = E 
This reduces Theorem [24.1] to proving the following estimate. 


Proposition 24.4. Assuming the set-up of Theorem and with A. 
defined by (23.15), we have 
2 


T Mino Mee Me™ = 110 aap) | Osteen) 


p23 


sieve 


Proof. As we will see, the proof boils down to a lattice-point counting esti- 
mate. For technical reasons to be explained later, we break the summation 
into short intervals. To this end, set 7 = 1/(log2)4+° and let J be the 
largest integer such that (1 —1)7 > 1//z. Since |A¥ vel < log x, we have 


T (Ab Neca eee x) = > > > I] eit nj) — Olax . i 


sieve? 
a(1— ”) \Jtleni, n3Kx 1<j<3 
2n2g=n1+N3 


If we cover the range of n, and n3 by intervals of the form (a(1—7)/*1, «(1 — 
n)|, where 7 € ZM[0, J], then the theorem is reduced to proving that 


(248) DUDS TT. Miievelmy) = pn? erars(1 + O(1/(log x)*)) 


23 (1—)<nj <2; (j=1,3) 1<j<3 
2ng=n1+Nn3 


for 21,23 € [Yx, x], where p = ]]3<,<y(1 — 1/(p — 1)?). 
Fix 21,23 as above and let T(x, 23) denote the sum in . We have 
(24.9) T(ai,03)= SY SoS] wlki) uke) (ks) L (ka, ke, ks), 
kj<D, P~ (ky) >y Vj 
where 
L(k1, ke, kg) := ee (log £1) (log £2) (log é3). 


xj(1—n)/kj<0;<a/k; (j=1,3) 
2helo=k1 li +k3 3, P= (t;)>y Vi 


256 24. Ternary arithmetic progressions 


We fix ki,k2,k3 < D free of prime factors < y and estimate L(k1, ke, k3). 
Since the variables k; are weighted with the Mobius function, it suffices to 
consider the case when they are all square-free. 


We rewrite L(k1, k2,k3) in the notation of Chapter [18] First of all, we 
remove the logarithmic weights from the variables ¢; using the fact that 
we have restricted them into short intervals. Indeed, the conditions x;(1 — 
n)/kjy < €; < «/kj for 7 = 1,3 imply that xo(1 — 9)/ke < l2 < we/ke 
with v2 = (a1 + %3)/2. We thus have log; = log(x;/k;) + O(n) = (1+ 
O(n/ log x)) log(a;/k;) for each 7. As a consequence, 


L(k1, ka, kg) = (1+ O(n/logx))SW,P) [J log(a;/k;), 
1<j<3 
where P = {3 < p< y} and W = (w,)°, is the sequence of weights 
lylols = n, Qkolo = kyl, + kz, \ 
j(1—n)/ky <€; <a/kj (9 = 1,3) 


To estimate S(W,P), we apply Theorem [18.i1{a). We must first check 
Axioms 


Given d|P, we set Wg = >°,.=0 (moda) Wn: Then, 

2{ Lilobs, d\tilol3, 2kolo = kil1 + k3e3, \ 
aj1—m)/ky <t; <a/kj (j= 1,3) 
To estimate Wy, we first split the range of the pairs (1, 23) according to their 


reduction mod 4, mod kz and mod d (which are mutually coprime integers). 
The permissible reductions lie in the sets 


A = { (a1, a3) € (Z/4Z)? : kya, + k3a3 = 2(mod4), a,a3 = 1 (mod 2) }, 
B = { (b1, 3) € (Z/koZ)? : kyb, + k3b3 = 0 (mod ky) }, 
C ={ (c4, 03) € (Z/dZ)? : c4¢3(kyc + k3c3) = 0 (mod d) }. 
Given (a1, a3) € A, (bi, b3) € B, (c1,¢3) € C and (é1, £3) € Z? such that 
(24.10) €;=a;(mod4), ;=b;(modka), €;=c;(modd) (j =1,3), 


Wn = loin : #{ (1, £2, €3) E Zz : 


Wa= #{ (1, 2, £3) € Z : 


the equation 2ko9ly = k,l, +k3é3 has a unique solution ¢2 which is necessarily 
an odd integer. In addition, the number of pairs (1, 23) that satisfy (24.10) 
and the inequalities 7;(1— 1) < 0; < 2; for 7 = 1,3 equals 


2 
Lj © 1X3 x 
O(1)) = + O 1). 
I Grae +0(1)) 16k kek 3d? = min{ hy, kg}d ) 


JE{1,3 
Therefore, 
2 
2123 nx 
Wa ce a eee )) |A] - |B] - |C| 


Making the entire circle a minor arc 257 


We easily see that |A| = 2. In addition, since d is square-free and coprime to 
kk3, the Chinese Remainder Theorem implies that |C| = [],)q(8p — 2). By 


a similar argument, |B| = [pits pion ned? Lp thashasnay 2 = kg - (k1, ko, k3), 
since ko is square-free. Putting everything together, we conclude that 


[ [Gp - 2) + 0(3° (nx + k5a)). 
p\d 


1°x103 (k1,k2, ks) 


Wa= 3a kikoks 


Therefore, Axiom[I] holds with X = UTES Pa tasts) | v(d)=[]ja(8—2/p) and 
ra < 3° (nx + k3d). In addition, Axiom DJhelds with « = 3, and Axiom [3] 
with D = 2/2, A=«K+4+1 and m =1, since ky, kz, k3 < exp{(logz)”} = 
2°) here. Hence, Theorem [18.11 a) implies that 
1) 2 (Ai, ko, k3) 

L(k, ka, =(1 ()). pre a ie a l ki), 

(ky ko k3) +O, ioe An L1Xx3 ikoks II og (a; / 3) 

JE{1,2,3} 
where \ = 87! Hos<pey(1 — 1/p)(1 — 2/p). Together with (24.9), this gives 
(24.11) T (21,23) = An*x123M + Oa(n*x123(log x)°), 
where 
ky, ko, kz) 
cS Ga TL atsvewtet 

and in the calculation of the error term of (24.11) we used the bound 
ee es tats) < (log x)?, which can be seen by letting g = (k1, kz, k3) 
and k; = gnj;. Since 7 = 1/(log a)4+5, (24.11) reduces (24.8) to proving that 


(24.12) M = [[Q—-1/p)* + Oa(1/(logx)*). 
Py 


We claim that we may replace (ki, k2,k3) by 1 in all summands of M at 
the cost of a small error term. Indeed, if g = (ki, k2,k3) > 1, then g > y 
since P~(g) > y. Hence, if we write kj = gnj;, then (g — 1)/(kikek3) < 
1gsy/(g?ningng3). Since sae 1/9? <1/y, we find that 


eu) Me SY Te PES + o((og.2)*/y) 


kj<D, P~(kj)>yVjI=1 


If we let M; = Ue Pte kit U(k;) log(a;/k;), then the main term 
of wae factors as M,M2M3. By Corollary we have the estimate 
Lik<w, P-(k)>y H(K) & w/(log w)4+? for w > D. Hence, partial summation 
implies that 


Mj = (log x3) - (1/¢y)(1) + (1/Gy)'(L) + Oa(1/(log x)*) 


258 24. Ternary arithmetic progressions 


with G(s) = )’p-(x)sy% ° defined as in Chapter 22] Since G(s) ~ (s — 
1) TI ,<y(1— 1/p) and G(s) ~ (s-1)7? TIp<y(1- 1/p) as s + 17, we infer 
that (1/¢y)(1) = 0 and (1/¢y)’(1) = [],<y(1- 1/p)~*. Relation then 
follows, thus completing the proof of (24.8), and hence of the theorem. 


Binary additive problems 


It is natural to wonder whether the circle method can be used to approach 
binary additive problems such as the twin prime conjecture. For this prob- 
lem too we have a formula analogous to (24.2): 


1 
(24.14) So A(n)A(n +2) = / Stele aade 


NKx 
We still expect the main term to come from the major arcs. As a matter of 
fact, it can be shown rigorously that if we define MN by (24.4), then 


[ |S(x; a) |?e(—2a)da ~ cox, 
pire 


where cz = 2] [5301 — 1/(p - 1)?) is the twin prime constant. However, 
it is not known how to show that the minor arcs contribute a negligible 
amount. Indeed, as we discussed before, Parseval’s identity implies 
that S(x;a) = «'/?+°) for a “generic” a € [0,1]. Hence, no matter how we 
choose m, we cannot expect to have a better bound than 


i |S(x;a)|?da « xt), 
m 


which is of comparable size to the expected main term. This means that 
in order to prove the twin prime conjecture using the circle method, we 
must exploit cancellation between the various parts of the integral in (24.14) 
coming from the factor e(—2a). 


Exercises 


Exercise 24.1. If N is an odd integer, prove that 
SoS 235 Ara) A(na)A(ns) = Gy N? + O4(N?/(log N)*) 
nytn2g+n3=N 

for each fixed A > 1, where Gy = [],)y (1 —1/(p— 1)?) I] yw (1 + 1/(p — 1)?). 


Exercise 24.2. If A‘ 


sieve 


So Ab eve(M) Ab eye(m + 2) = cow + Oa(c/(log x)4) 


nxu 


for every fixed A > 1, where cz = 2]],53(1 — 1/(p— 1)”). 


is as in Proposition 24.4] prove that 


Chapter 25 


Bilinear forms and the 
large sieve 


Let us suppose we are given a sequence of complex numbers (€n)nez and 
we wish to study its distribution among the different arithmetic progressions 
mod q. To be more precise, our aim is to determine the behavior of the sum 


S> Cnly=j (mod q) 


H<n<x<H+N 


when j varies over Z/qZ, with N > 1 and H being two given integers. To 
this end, we consider the additive and multiplicative Fourier transform mod 
q of the above sum (viewed as a function of j). These are given by 


St(a/q)= > cne(an/g) and S*(x)= >> eax(n), 


H<n<H+N H<n<H+N 


with a running over Z/qZ and x over all Dirichlet characters mod q. 


For a general sequence (Cn)nez, we cannot obtain a non-trivial pointwise 
bound for St (a/q). Indeed, if ¢, = e(—na/gq) for all n, then St(a/q) =N. A 
similar obstruction holds for S*(y), by taking c, = ¥(n) for some Dirichlet 
character +. However, we will prove non-trivial bounds on $t(a/q) and on 
S*(\) when we average over many a and q, or many x. 


With the above goal in mind, we consider the sums 


(5.1) SY > So [st(a/q)? and SS aah Calke 


q<Q a€(Z/qZ)* a<Q, x (mod q) 


260 25. Bilinear forms and the large sieve 


where the notation )>* means that the last sum runs over primitive charac- 
ters. A few remarks are in order about the shape of these sums: 


e The factor g/y(q) is of order 1 most of the time, so it can be ignored. 
We include it for normalization purposes that we will explain later. 


e We sum only over reduced residues a (mod q) because we want the 
fractions a/q to be distinct mod 1. To see why working with the 
full sum Tt = a<Q Lva€Z/q2. |S*+(a/q)|? is problematic, consider 
the case when c, = e(—n/3), so that S*(1/3) = N. The sum TT 
contains |Q/3| copies of |S*(1/3)|?; one for each q < Q that is a 
multiple of 3. In particular, T+ > |Q/3|-N?. We thus see that 
the fact that (¢n)H<n<H+Nn is not well distributed with respect to 
a single modulus causes T’* to be very large. For a similar reason, 
we work exclusively with primitive characters instead of working 


with the full sum <Q > x (mod q) (4/P(4))|S* (x) ?. 


e We consider a second moment of S*(a/q) and of S*(x) (ie., an 
average of squares) because this gives us access to L?-techniques 
coming from the theory of bilinear forms. We develop this theory 
in the next section and use it to study the sums of in the 
subsequent section. 


Bilinear forms 


An M x N bilinear form over C is a function p : C“ x CX > C that is 
linear in both coordinates. This is equivalent to the existence of certain 
coefficients @mn € C such that 


for all # = (71,...,27) € C™ and all ¥ = (y1,..., yn) € CX. 


Before we go into more abstract matters, let us discuss a few examples 
that illustrate the central role of bilinear forms in analytic number theory. 


Example 25.1. (a) If we take amn = lmn<x:e(mna), then w(#, 7) is equal 
to dincalf *g)(n)e(na), where f(m) = Ip aj(m)am and g(r) = Ipay(™) yn 
(see relation (23.31). 


(b) If {a1,...,a ¢} denotes the set of reduced fractions a/q with 1 < 
a<q<Q (often called the Farey fractions of order < Q), H is some fixed 
integer, and we set dmn = e(@m(n+ H)), then 


M N 


W(z,¥) = ~~ Lm Se yne((n + H)am). 


m=1 n=l 


Bilinear forms 261 


In particular, when tm = ~ Yne(—(n + H)am) for all m, we have 
M 


WEI = Do 


Letting Yy, = Cn+H, We see ene “Hg, ¥) is equal to the first average of 
(25.1). Hence, pointwise bounds on wz.) supply information about the 
distribution of (¢n)y4¥<n<H+N in arithmetic progressions. 


N 2 


dhe ((n + H)am) 


=1 


(c) We can state both of the above examples in unified notation: let 
H €Z, M,N © Zs, and {a1,...,a,¢} be a set of real numbers. In addition, 
for each m € [1, M] NZ, let Im be a subinterval of [1, N]. We then consider 
the bilinear form with amn = e(@m(n + H)) + 11,,(n). 

We immediately see that the bilinear form of part (b) can be recast in 
this notation. On the other hand, if a7, = ma and I, = [1,2/m]N[1, N], 
then we recover the bilinear form of part (a). 


Example 25.2. Let C = {x1 (modqi),..., x (mod qm)} bea set of Dirich- 
let characters, and consider the bilinear form with coefficients @mn = Xm(n+ 
H)\/dm/0(dm). (The factor \/gm/v(dm) normalizes the vectors (dm,n)>_1 
to all have roughly the same ¢?-norm.) Arguing as in Example 25.1(b), we 
see that this form is related to the second average of (with C being 
the set of primitive Dirichlet characters of conductor < Q). 


Example 25.3. Bilinear forms can be generalized to multilinear forms in 
a straightforward way. For instance, given k € Z, let amngq = 1(q,4)=1 - 
(1imn=k (mod q) — 1(mn,q)=1/¥(@)) and consider the M x N x Q trilinear form 


(z; y, Z) = a », % am,n,qt@mUn~q 


m<M n<N a<Q 


= > aa( wy nth — ‘> nth). 


a<Q m<M,n<N ac m<M,n<N 
(q,k)=1 mn=k (mod q) (mn,q)= 1 


This expression controls the distribution in certain arithmetic progressions of 
the function f *g, where f(m) = 1j1,7j(™)-%m and g(n) = 1p,ny(")-Yn- 


The norm of a bilinear form. The power of the method of bilinear forms 
lies in its ability to produce pointwise bounds for ~(Z, 7) valid for general 
vectors and y. The key notion for doing so is the norm of w. It is defined 
to be the smallest positive real number ||7)|| such that 


(25.3) WEA < Wl (lzlleli#llz for all Fe CY, FEC, 
where ||-||2 denotes the usual Euclidean norm, defined by |]i'|]}. = v?+---+03 
for a vector 0 = (v1,...,vg) € C?. The emphasis should be put here on the 


assumption that (25.3) is true for all vectors 7 € C™’ and 7 € CN. 


262 25. Bilinear forms and the large sieve 


The existence of ||~|| is easy to establish: if Z = 0 or 7 = 0, then 
is trivially true. Otherwise, we set 0 = Z/||#|/2 and w = ¥/|l¥\/2, so 
that ¢ € S™” and w € SN, where S¢ denotes the d-dimensional complex 
unit sphere. The bilinearity of 7 renders equivalent to the inequality 
\-b(8, B)| < ||| for all je S™, we SN. We thus find that ||| is equal to 
max{|7)(v, w)| : 7 € S”,w € SN}, which exists by the compactness of S™/ 
and S$, and by the continuity of ~. 

Example 25.4. In Theorem [23.6] we saw an instance of (25.3). Indeed, we 
can reinterpret Theorem [23.6] as stating that the norm of the bilinear form 
w of Example B5.1(a) is < (q+ M+ N +MN/q)'/?(log(2q))!/?, where a/q 
is a reduced fraction such that |a — a/q| < 1/q?. In addition, the discussion 
in Remark3.7limplies that ||~)|| >> (M@+N+MN/q)'/? when a = a/q. 


A spectral interpretation of the norm. To gain some intuition about 
what the norm of a bilinear form measures, we study the special case when 
M = N and the coefficients of ¢) form a Hermitian matrix (i.e., dmn = Gn,m)- 
We then know from the Spectral Theorem for Hermitian matrices that C’ 
admits an orthonormal basis of eigenvectors of A, say &, ..., @y. Let Aj, 
..., Aw be the corresponding eigenvalues, which are real numbers. We claim 
that 


(25.4) Ill] = max{]Ai],-.., Awl. 


To see this identity, it is convenient to work with the Hermitian analogue of 
w, defined by 


A straightforward computation reveals that $(€n,€n) = Im=n-‘An- Hence, if 
we express % and 7 with respect to the basis {€1,...,é@y}, say = ae SnEn 
and 7 = sy tnén, then 


o(z, y) = ae» Sinta®lEmsEn) = S> Asati 


l<m,n<N l<n<N 
If L = max{|Aj|,...,|Aw|}, then 
IME DI<L Y- [Sntnl- 
l<n<N 
We then apply the Cauchy-Schwarz inequality to the sum on the right side. 
Since \7*_, |sn|? = ||a#|]3 and 37, |tn|? = ||7|[3 from the orthonormality 
of the vectors €1, ..., @n, we conclude that |¢(z,7)| < L- ||Z|J2||7|/2 for all 


#,9 € CN, whence ||w|| < L. 

On the other hand, we have |¢(€,, €,)| < ||w|| for all n, as it can be seen 
by the definition of ||7|| and the fact that ||&,||>2 = 1. Since $(&,, €,) = An, 
we infer that |||] > |An| for all n, thus completing the proof of (25.4). 


Bilinear forms 263 


In conclusion, we may think of the norm of 7 as something like the size 
of the large eigenvalue of a matrix with number-theoretic properties. 


Approximate orthogonality. In practice, it is hard to get a handle on 
the eigenvalues of the matrix of coefficients of ~. Instead, we develop a 
different method that has as its starting point the obvious relations 


(25.5) V(z,9 = Soon Stn and w(z,¥) = Sm So aman 


and which does not require a to equal N. 


As we briefly explained in Chapter the importance of the above 
relations is that they express 7(Z,y) as an average of averages. Applying 
the Cauchy-Schwarz inequality to the first identity of (25.5), we find that 


M N 
2 
(25.6) WET)? < NEB D> | So amon 


m=1 n=1 


This maneuver allows us to eliminate the unknown coefficients x,, and 
smoothen out the variable m, which is now weighted with the function 1. 
Similarly, if we apply the Cauchy-Schwarz inequality to the second identity 
of (25.5), we obtain the bound 


N N 2 
(25.7) WEI? < IIB SO | Yo annem 


n=1 m=1 


We open the square in (25.7) using the identity |z|? = Z- z to deduce that 


N M M 
Waar <i S| So Griptentrantinn 


n=1 = mg=1 


= (Wal3 > 3S Zima 9 im na 


my=l1m2=1 


(25.8) 


In particular, if the sum see G@m1,nAmo,n is “small” whenever m1 4 m2, we 
hope to obtain a non-trivial bound for the sum in (25.8), and hence for ||q]]. 


There is a more conceptual way to interpret what we did above. We can 
write the first identity of (25.5) as 
(25.9) W(E9)= So am (Fim), where Tm = Gm,1s---»Fm,n); 


and the second ee of (25.5) as 


(25.10) (#,¥)= Se (Z, Gn) where wn = (Gin,---;EMn): 


264 25. Bilinear forms and the large sieve 


Similarly, (25.8) becomes 
M M 


(25.11) w(z, g)|\? < wll » S> EmiZmy * (Umi; Umo)- 


my=1m2=1 


Hence, we see that ||~|| should be small when the inner products (Um, , Um) 
are small whenever m, 4 mo, that is to say, when the vectors Uj, are ap- 
proximately orthogonal to each other. The relevance of this property can 
be seen more easily via relation (25.9): a fixed vector 7 cannot correlate 
strongly with many of the approximately orthogonal vectors U,,. Hence, 
most of the inner products (¥, U,) should be small compared to the trivial 
bound ||7||2|/%m||2 coming from the Cauchy-Schwarz inequality. This should 
yield a non-trivial bound on the sum of (25.9). 


Remark 25.5. It must be stressed that in order to exploit the approximate 
orthogonality of the vectors v,,, we have to start with (25.10) that involves 
the vectors w,, rather than with (25.9). 


In order to get a sense of what to expect as a bound on ||7|| when the 
vectors Um are approximately orthogonal to each other, we study the ideal 
case when they are truly orthogonal to each other. Essentially, the lemma 
we prove below constitutes a generalization of to non-square matrices. 


Lemma 25.6. Let ~,tm be as above, and set L = max{||ti|l2,.-., ||Gac|lo}- 
Then ||w|| > L. If, in addition, we assume that the vectors v1,...,vy are 
orthogonal to each other, then |||| = L 


Proof. Let €),...,€,¢ denote the vectors of the standard ie - C™. For 
any m < M, we have W(Em; Um) = |\%n||%. On the other hand, 3) implies 
that U(Em, Gm) < ||| - |]Gm||.. We thus conclude that ||w|| > Bae Since 
m can be chosen arbitrarily, the first part of the lemma follows. 


For the second part, note that the set {v1/||ti|lo,...,0/||Ga|lo} is or- 
thonormal when the vectors v1,...,0,¢ are mutually orthogonal. Combining 
(25.6) with Bessel’s inequality, we deduce that 


M 
EDP <3 SO Gam)? < PZB y eae =) 
m=1 mm 


< < PII 118 


In particular, ||~|| < L, which completes the proof. 


Remark 25.7. Working with the vectors w,,, and with relations (25.7) and 
(25.10), we can show that ||w|| > max{||Wi|l2,..., |/Wa||2}. Moreover, this 
lower bound is sharp when the vectors w,, are mutually orthogonal. 


Bilinear forms 265 


We now return to the more general case when the vectors Uj are approx- 
imately orthogonal to each other. In Theorem [23.6] we saw one example of 
how to exploit the mutual quasi-orthogonality of the vectors Uj. Below, we 
give another one that deals with Example 25.2] when C = {x (mod q)}. 


Example 25.8. Given q, N €N, let ~ denote the bilinear form with coeffi- 
cients mn = Xm()V/¢/p(q), where ym ranges over all Dirichlet characters 
mod q and n € ZN [1, N]. We wish to bound the norm of w. 


Note that 


O(,/¢q1 if 
(25.12) \(Gm1,tmo)| = q | (/¢ og q) if my, # mao, 


(9g) | Nv(q)/q+ O(2°™) otherwise, 


with the first case following by the Pdélya-Vinogradov inequality and the 
second one by Theorem [2.1] Inserting (25.12) into (25.11) implies that 


es yP <8 So eal ve Sel o( 5 288)), 


mi=lme2=1 


Using the inequality |zw]| < (|z|? + |w|?)/2 and the symmetry of the above 
double sum over m, and mz, we find that 


M M M M 
So SS lami tml < So emi? SO 1 = MIZI5 = e(@IZI3. 
m=1 m=1 


mi=lme2=1 


As a consequence, 
(25.13) WEA? < (N+ OF? log gE NSNTI5 


for all ¢ € C™” and 7 € CX, that is to say, ||y|/?_ < N + O(q?/? log q). 
Comparing this upper er to the lower bound in Lemma yields, we 
see that it is sharp when N > q?/? log q. 


The bilinear form w of the present example has the special feature that 
we can easily control the inner products of the dual vectors wy, too: 
q 


(q) » X(ri)x(ns) =@> Lining.g)—1 > Lana Goody) 


x (mod q) 


To exploit this identity, we start with (25.9). The analogue of (25.11) in 
this setting is 


N N 
WEA? < NEE DS SS Grr ye» (Wer, Wms) 


ny=lne=l1 


(Wns ’ Wny) = 


N N 
= eal “qd > S° Yn Une . Linyne,q)=1 : Iny=ny (mod q): 


ny=l ng=1 


266 25. Bilinear forms and the large sieve 


Since |ya,Ynel < (na |2-+lagl2)/2 and #H{n < Nin =a(modg)} <1+4N/q 
for all a € Z, we conclude that 


(25.14) WED? <W + lela 


for all Z€ C’ and ¥ € CX. Hence, |||? < N+, which improves upon 
(25.13) and is sharp in view of Lemma [25.6 


Duality. The discussion of the above example brings forward an impor- 
tant notion that underlies the theory of bilinear forms. Consider the lin- 
ear operators T : C@ -> C% and T* : CN + C™, defined by T(Z) = 
eae jit) 4 and. gy = ee 1 GmnYn)_. We then have 


(25.15) Vey") = (TY) = (T(@),9), 


where 7* = (Y,...,Y,,). This relation is an equivalent way of writing 
(with the coordinates of ¥ conjugated), and it implies that T* is the adjoint 
operator of T. 

Now, note ae eae the Cauchy-Schwarz inequality to the third 
expression of (25.15) yields the upper bound |y(z,y*)| < ||T(Z)|l2|lylle- 
Moreover, Mea is an aig when ¥ = T(z). Consequently, ||7|| is equal to 
the smallest positive number B such that ||T(Z)||2 < B- ||Z|/2. This number 
B is precisely the norm of the operator T, which we denote by ||7'||. 

We have thus proven that ||w|| = ||T'||. Similarly, working with the 
second expression of (25.15), we can show that |||] = ||7*||. In particular, 
we see that T and 7* have the same norm, a well-known fact from functional 
analysis called the duality principle{167| Proposition 5.4, p. 183]. 


To sum up, we have proved the following result. 


Theorem 25.9. Let (Gmn)m<M,n<Nn be some complex coefficients and let 
A>0. The eee statements are equivalent: 


og — Pantanal < AllZlla|lZ|l2. for alle C’, FEC. 
pa ape ctpental” < A?||Z||2 for all Ze C™. 


: yy | L Gmnel <A2|9|2 for all ge CN. 


Remark 25.10. Assume that |a@m | = 1 for all m,n. In view of Lemma 


and Remark we have that ||2|| > max{/M,/N}. Assume for 
simplicity that N < M. Hence, the best possible result we can hope for is 
of the form ||| < VM. In view of Theorem [25.9] this would imply that 


1 M N 2 
(25.16) ivi S> | Sees 


m=1 n=1 


<N whenever max |yp,| < 1, 
l<n<N 


The large sieve 267 


since ||7|l2 < VN when lyn] <1 for all n. If we let S,, = Se Ynam,n, We 
can interpret as saying that S,, < N'/? on average over m < M. 
Notice that this is approximately the square root of the trivial pointwise 
bound |S;,| < N. We then say that S,,, exhibits square-root cancellation on 
average over m < M. Consequently, in the special case when y, = 1p(n) 
and Gm.n = Xm(n), with xm running over an appropriate family of Dirichlet 
characters, relation can be interpreted as an averaged version of the 
Generalized Riemann Hypothesis (see Exercises [8.6] and [11.3{(b)). 


The large sieve 


One of the most important applications of the theory of bilinear forms is 
the large sieve. It was first discovered by Linnik while studying sieve 
problems with unbounded sifting dimension (hence the name “large sieve” ). 
We will present these arithmetic applications in the next section. First, we 
develop the large sieve it in its abstract form, as an inequality for additive 
and multiplicative characters. 


Our goal is to bound the norms of the bilinear forms associated to the 
sums in (25.1). We begin by studying the first of these forms. We consider a 
slightly more general set-up: let {a1,...,aR} be a set of relation numbers. 
We want to obtain bounds for the sum 


(25.17) d= St (ar))?, 


1<r<R 


where we have extended $* to all real numbers by letting 


S*oa)= > Cne(na). 


H<n<H+N 


In view of Theorem [25.9] the underlying bilinear form has coefficients a; = 
e(na,) with r € {1,...,R} andne {H+1,...,H+N}. We then consider 
the vectors 

Uy = (e(—nar)) H<en<H+N 

and note that 


a 1 
(25.18) (Ors te) = SS e(n(as = Qr)) < las — a] 
H<n<H+N one ae 
by arguing as in (23.34). Hence, if the set {a1,...,aR} is d-spaced mod 1 
(recall that this means that ||a; — as|| > 6 when r 4 s) with 6~! = o(N), 
the vectors v1, ..., Ug are approximately orthogonal to each other. 


The first average of (25.1) is of the form (25.17) with the set of points 
a, being the Farey fractions of order < @, which we denote by Fg. If a/q 


268 25. Bilinear forms and the large sieve 


and a’/q' are distinct elements of Fg written in lowest terms, then 
ae / / 1 1 
_ lag an oe 
qq qd’ Q 
for every integer n, that is to say, the set Fg is Q-?-spaced. Hence, we 
can bound Pages |S*(a/q)|? by working with the more general sum of 


(25.19) 


(25.17). For the latter, we prove the following fundamental theorem. 


Theorem 25.11 (The additive large sieve inequality). Let {a1,...,aR} 
be a set of real numbers that are d-spaced mod1, H € Z, N € Zs, and 
@= (cH41,...,cH4n) € C%. We then have 


R 
(25.20) > | S- Cne(na,) 


r=l1 H<n<H+N 


2 Ly] ay 2 
<(N +6“ )Ilells- 


Proof. Given any H’ € Z, we have 
) ene(na)| = | ) Cn—H'+He(na)|, 
AH<n<Ht+N H’<n<N+H’' 


by letting n = m — H'+ H in the first sum and then noticing that e(na) = 
e((H — H’)a)-e(ma). It thus suffices to prove the theorem for a special 
choice of H, and it will automatically follow for all other values of H. We 
shall take H = —| N/2| —1 that (almost) symmetrizes the range of n. 


Instead of proving (25.20), we use Theorem which tells us that it 
suffices to prove the dual inequality 


R 
(25.21) S- | S > bre(nar) 


H<n<H+N r=1 


: Lye H2 
<(N+0°°)Ilb]l3 


for allb = (b1,...,bR) € C®. We will open the square and take advantage of 
the mutual quasi-orthogonality of the vectors (e(na;))H<en<H+N, but first 
we smoothen the n variable a bit further. This will improve the quality of 
the bound we obtain in terms of 6 (see Remark 25.12). 


Since we have assumed that H = —|N/2| — 1, we have that |n| < N/2 
in the left-hand side of (25.21). Therefore, 


R 2 R 
| >> Pre(nar) <2 So (1- In| /N)| S > bpe(nay) 


H<n<H+N r=1 In|<N r=1 


Expanding the square on the right side as in (25.8), and bringing the sum 
over n inside, we find that 


R 2 R R _ 
S- | D2 Pre(nar) < 25> S75, SD (1 = |n|/N)e(n(a — a5). 


H<n<H+N r=1 r7=1 sol |n|<N 


2 


The large sieve 269 


The innermost sum is the Fejér kernel Fy evaluated at a, — as. Indeed, 
recall the well-known identities 


N-1 . ; 
Fy (a) = = S- > e(ma) = > (1 — |n|/N)e(na) = 1 /sin(Nra) | 
“ N 


sin(7a@ 
n=0 |m|<n |n|<N ( ) 


In particular, we have 
1 \ 2 N 
’ Nsin?(ra)J ~ max{1,2N|lal|}2’ 


since sin(mz) > 2x for x € [0,1/2]. Together with the inequality 2|b,-bs| < 
|b, |? +|bs|? and the symmetry of the summation in r and s, this implies that 


R R R 
a |r|? + [bs|? 
b <N 
| d_ bre(nar) a max{1, 2N la; — as||}2 
HA<n<H+N r=1 r=l-s=1 


R R 2N 
25.23 = b,.|? : 
( ) Ds 4 2s imax{, IN]ar eal? 


f= 


(25.22) —- |Fy(a)|< min { Fry(0) 


It remains to bound the innermost sum over s. 
Let J = |1/(2N6)| andr € {1,..., R}. There is an ordering as,,...,Qsp 
with s; =r such that the sequence {]|as; — ar ||}, is increasing. Since the 
points as are d-spaced mod 1, we have ||as,, — a,|| > j§ when 1 < j < R/2, 

as well as ||as,,,, — || > j6 when 1 < j < (R—1)/2. Consequently, 
R 


2N AN 
S <2N + ) 
= 2 2 
“ max{1, 2N lar — as||} eee max{1,2N06j7} 
1 
<2N+4NJ + : 
+ N62;2 
jeJt+l 
1 
(25.24) <2N +4NJ + =: min{1/J, 7/6}, 


because >) is 741 g° < fe de = 1/0 and ey>1 7 7 = 07/6. 

If N > 1/(26), then J = 0, so the expression in is < N; on 
the other hand, if N < 1/(26), then J = 1/(.N6) > 1, so the expression in 
is x 6-1. In any case, 


R 


2N 
N+61. 
» max{LINJas 2a? = 


s=1 


Inserting the above estimate into (25.23) completes the proof of (25.21), and 
hence of the theorem. 


Remark 25.12. Had we not smoothened the sum over n in (25.21), we 
would have had to use (25.18) instead of (25.22) and, hence, to replace the 


270 25. Bilinear forms and the large sieve 


innermost sum of (25.23) by ><, ||ar—as||~t. This last sum is < 6~'log R 
(and this estimate is best possible for general d-spaced points a;). We would 
have thus proven (25.20) with N + 6~'log R in place of N +671. 


Remark 25.13. Montgomery-Vaughan [146] and Selberg [163] proved inde- 
pendently that the implicit constant in Theorem 25.11]can be taken to be 1. 
They also showed that N+1/6 can be replaced by N+1/6—1 Theorem 
7.7]. As Lemma [25.6] reveals, this is essentially best possible. 


As a direct corollary of Theorem[25.11and of relation (25.19), we obtain 
the following result. 


Theorem 25.14 (The additive large sieve inequality, II). LetQ >1, H € Z, 
N € Zs, and @= (cH41,...,CH+N) € C’. We then have 


SY | EY enelna/o)’ < wv +@?VIe13. 


q<Q a€(Z/qZ)* H<n<H+N 
We now turn to the second expression of (25.1). 


Theorem 25.15 (The multiplicative large sieve inequality). Let Q > 1, 
H€Z, N € Zs, and @= (cy41,...,cy4n) € CX. We then have 


SY gl LV cox] «<ov+ ope08 


q<Q, x (mod q) H<n<H+N 


where the notation )>* means that the sum runs over primitive characters. 


Proof. The associated bilinear form has coefficients am = %m(n), where 
Xm ranges over all primitive Dirichlet characters of conductor < Q, and 
n€Z0(H,H +N]. However, instead of working with this form, we will 
show that 


(25.25) 41S Isx~@P< SS Ist (e/a? 


eq) x (mod q) ae (Z/qZ)* 
for all g, and then invoke Theorem 25.11] where the functions S* and S* 
are defined as in the beginning of this chapter. 
When x is a primitive character mod q, Theorem implies that 


Si)= SD ema YD Xale(na/a) 
H<n<x<H+N 


LJ. ee aie 
= FH de, Mera 


The arithmetic form of the large sieve 271 


Since |G(X)| = \/@ for a primitive character x (mod q) by Theorem [10.4] we 
find that 


M4 

TR 
x 

& 

Nh 
| 


Ye xla)S*(a/a)f 


1 

x (mod ¢) 4. (modq) a€(Z/qZ)* 
1 
qd 


IX 


| XS xa)s*(a/a)). 


x(modq) a€(Z/qZ)* 


(25.26) 29) |st(a/aP, 


with the last relation following from Parseval’s identity (10.6) for multi- 
plicative characters. This proves (25.25), thus completing the proof of the 
theorem. 


The arithmetic form of the large sieve 


The use of the term “sieve” in this chapter is not at all evident, since what 
we have talked about so far bears no resemblance to the sieve theory we 
developed in Part [4] However, it is possible to use the large sieve inequality 
to deduce a rather strong sieve upper bound, thus justifying the terminology 
“large sieve”. 


The set-up is slightly different compared to the one we saw in Part [4] of 
the book. To motivate it, consider a polynomial F(x) € Z[z] and the sets 
A={F(n):n< x} and P = {p < y}. Then, S(A,P) counts integers 
n <a such that p{ F(n) for all p < y. Equivalently, if we let 

Rp ={me€ Z/pZ : F(m) =0(modp) }, 


then S(A,P) counts integers n < x such that n ¢ Rp, (modp) for all p < y 
(We then say that n “avoids” the set R, for all p < y.) Notice that the set 
A satisfies Axiom [I] with v(p) = |R,|. In particular, the function y* we saw 
in Chapter 2i]and in Theorem 21. JJis given by 


n) =n] [GQ —|Rpl/p). 
pin 


With these remarks in mind, we now state the main result of this section. 


Theorem 25.16 (The arithmetic large sieve inequality). Lety >1, HE Z 
and N € Zs,. In addition, for each prime p < y, let Rp C Z/pZ. If 


NC{H<n<H+N:n¢ Rp(modp) for allp<y}, 
then 


#N «K(N+y? )/ em) where fm) = TT a. 
D 


m<y pim 


272 25. Bilinear forms and the large sieve 


Proof. The theorem is trivial if Rp = Z/pZ for some p < y, because V = 0) 
and f(p) = oo in this case. So let us assume that Rp, # Z/pZ for all p < y. 


The main idea is that if NV avoids many residue classes modulo each 
prime p < y, it must be unevenly distributed among residue classes of arith- 
metic progressions of moduli < y. The large sieve inequality implies that 
this can only happen if NV is very sparse. 


More concretely, let (Cn)nez be a sequence supported on N, and fix for 
the moment a prime p < y. To study the distribution of (cpn)nez mod p, we 
equip Z/pZ with the uniform counting measure (that is to say, P(A) = |A|/p 
for A C Z/pZ) and consider the random variable X : Z/pZ — C defined by 


A= See 
n=a (mod p) 


We will study the variance of X. On the one hand, V[X] must be large 
because WV avoids the set R,, and thus X(a) = 0 whenever a € R,. On the 
other hand, we will express V[X] in terms of the additive Fourier transform 
of X, which will allow us to study it using the large sieve. 


We start by recalling that 
V[X] = E[|X — E[X]|?] = E[|X|*] — |E[X]|?. 
To take advantage of the fact that X(a) = 0 when a € Rp, we use the 
Cauchy-Schwarz inequality: we have that 
1 22 B= Ral 
eixyP=|° x] <<? -BIXP, 
a€(Z/pZ)\ Rp 
Since ELX] = >0,,¢z n/p, we conclude that 


(25.27) V[X] > G = ~ 1) |ELX aa me »|- 


nEN 


On the other hand, we can write V[X] in terms of the Fourier series 


a= > Cne(an). 


neZ 
Indeed, we have i =p! 2b (mod p) €(—ab/p)S(b/p 


) 
aIXAI=S DY | YO e-abse/ml = Yo isee/o)e 


a(modp) 6(modp) b (mod p) 


, whence 


by Parseval’s identity (10.6) for additive characters. Since we also have that 
E|X] = S(0)/p, we infer that 


VixXI= 3 DY |se/m)p. 


be(Z/pZ)* 


The arithmetic form of the large sieve 273 


Comparing the above identity with (25.27), we conclude that 


(25.28) | Kene(an/p)| > £0) Den] 


a€(Z/pZ)* neZ neZ 


Notice that we have not assumed anything about the coefficients c,, other 
than that they are supported on NV. 


More generally, we claim that 


(25.29) ys, - ene(an/a)| 2 r(a)| SS Gy ° 


a(modq) néeZ neZ 


for all g|P(y) = Tey? and all sequences (Cn)nez C C supported on N, 
where )** denotes here a sum running over integers coprime to g. We prove 
by induction on w(q). When q = 1, it holds trivially; when w(q) = 1, 
it follows by (25.28). Finally, assume that holds for all g|P(y) with 
w(q) <j, where j is some positive integer. Let q|P(y) with w(q) =j +1. 

We may write g = qq@2 with w(q1),w(q2) < j, so that holds for 
the moduli gq; and q2. Note that (q1, q2) = 1 by the fact that q is square-free. 
Consequently, when a; ranges over (Z/q,Z)* and ag ranges over (Z/qoZ)*, 
then aiq2 + a2q, ranges over (Z/qZ)*. We thus find that 


AP 9 ee ae POL ara 


a(modq) neEZ a1 (mod q1) a2 (mod q2) ce 


We apply (25.29) with q2 in place of g, and with c,e(ain/q,) in place of cy. 


Hence, 
S| Tone(ME Hl 2 el] eel) 


az (modqz) neEZ 92 


Summing the above inequality over a,j € (Z/q,Z)* and applying again 
(25.29), this time with q in place of g, we deduce that 


ST |Nene(S) > rearslaad] Def 
neZ 


a(modq) neZ 


Since f is a multiplicative function, relation (25.29) follows. This completes 
the inductive step, and hence the proof of (25.29). 


Finally, applying (25.29) with cn = Inewy, and summing it over all 
square-free q < y, we find that 


INP S? POS < S~ | de (na/a)| 
qsy q<y a(modq) nen 


The right-hand side is < (N + y?)|N’| by Theorem [25.14] thus completing 
the proof of the theorem. 


274 25. Bilinear forms and the large sieve 


Theorem 25.16] is of comparable strength to Theorem[21.1] As a matter 
of fact, if we use the improved large sieve inequality mentioned in Remark 
25.13] we can obtain a new proof of Theorem[21.4] (see also for a further 
improvement of this result). However, the true strength of the large sieve is 
revealed when the sets Ry, have unbounded cardinality on average, in which 
case the sifting dimension is also unbounded. We illustrate this point by 
studying Vinogradov’s least quadratic nonresidue problem. 


Given a prime p, we let n, denote the least quadratic nonresidue, that 
is to say, the smallest integer n > 1 for which (n|p) = —1. We know from 
elementary number theory that for half of the integers n € [1,p — 1] we 
have (n|p) = —1. In fact, since the Legendre symbol (-|p) is a non-principal 
Dirichlet character mod p, we have 


> (5) = vPlose) 


n<N 


by the Polya-Vinogradov inequality. In particular, n» = O(,/plogp). Exer- 
cise [25.6] establishes the improved bound 


(25.30) Np < pilavero()) (p > oo). 
Vinogradov conjectured that the stronger estimate 
Np = Oc(p*) 


is true for each fixed ¢ > 0. We use the large sieve to show that Vinogradov’s 
conjecture holds for the vast majority of primes. 


Theorem 25.17. Fire >0. For x > 3, we have 
#{p< U2,» > p°} <<; logloga. 


Proof. For every y > 1, we will show that 


(25.31) HAD Sy i Mp 2 yf f = O-(1). 
The theorem then follows by setting y; = e®’ and noticing that 
H{D<U:m2>P}< + > HL Yj-1 < PL Yj i Mp 2 Sys} 
j<log log x 


To show (25.31), let 
N ={m<y?:Pt(m) < y*}. 


On the one hand, |N| >>. y? by Theorem [4.5] On the other hand, we can 
use the large sieve to bound ||: for each prime p with np > y*, we let 


Ry = {a(mod p) : (alp) € {0,—1}}; 
otherwise, we let R, = 0. We claim that N avoids the sets Rp. 


Exercises 275 


Indeed, let p be a prime and n € N. If np < y*, then Rp = 0, so we 
naturally have n ¢ Rp (modp). Assume now that n, > y°. Since Pt(m) < 
y® < Np, we have (p’|p) = 1 for all p’|n. The multiplicativity of the Legendre 
symbol thus implies that (n|p) = 1, that is to say, n ¢ R,(modp), as 
claimed. 


From the above discussion, we may apply Theorem [25.16] to find that 
Wi<v?/ SO wm) s(m), 
m<y? 
where f(m) = [[pim |pl/(p — |Rpl). Since |N'| >. y”, we conclude that 
XPM) fm) = 0-(1). 
m<N 


But note that if p > 3 is a prime with n, > y*, then f(p) = (p+ 1)/(p— 1) 
by the definition of R,. In particular, 


S> w(m)f(m) > {p< yi mp > yf. 
m<N 


This proves that (25.31) holds, thus completing the proof of the theorem. 


Exercises 


Exercise 25.1. Let C = {y1,..-, xa} be a set of Dirichlet characters, where y;, 
is a character to the modulus qm. 


(a) We say that C is reduced if ¥m,Xm, is non-principal when m, # m2. Show that 
the following sets of characters are reduced: (i) any subset of {x (mod q)}; (ii) 
any set of primitive Dirichlet characters. 


(b) Assume that C is reduced and let 
Q= max{ [Gens 3 Gira | :l<m,m2<M,m, # meg }. 
Prove that 
2 
| YS canis] < + OC Qlog Q))Ie13 
xEC H<n<H+N 
for all N € Zsi, H € Z and €= (cw41,...,cH+n) € Cy. 
Exercise 25.2. Let y(modgq) be a fixed non-principal Dirichlet character, and let 
fig: N—- {2 €C: |z| < 1} be supported on [1, M] and [1, N], respectively. 


Explain why the method of bilinear forms cannot yield a general non-trivial bound 
for the sum eo) *« g)(n) x(n). 


Exercise 25.3. Adapt the proof of Theorem[25.11|to show that the factor \/log(2q) 
can be removed from the statement of Theorem Conclude that the exponent 
of log x in Theorem [23.8]can be improved from 5/2 to 2. 


276 25. Bilinear forms and the large sieve 


Exercise 25.4. Let N,J € Zsi, and set N; = |N/2|. Define the function g : 
Z —» [0,1] by letting g(n) = 1 when |n| < M, g(n) = 1—- (\n| — M1)/J when 
N, < |n| < Ni 4+ J, and g(n) = 0 when |n| > Ni + J. 

g( 


(a) Prove that ljnj<n/2 < g(n) for all n € Z and 
S¢ g(n)e(na) = J7*- [(M. + J) Fy 4s(a) — Mw, (0), 
neZ 
where Fy, denotes the Fejér kernel. 
(b) Choose an appropriate J to prove that the left-hand side of Theorem [25.11] is 
< (N+1+7vV66-1/3)|\é||3. 
Exercise 25.5 (Gallagher). 
(a) Given f € C1({a,b]) and c € [a,b], prove that 


b _ b 
FOS pao f liwlde + POPS Pp ayia, 


[Hint: Note that f°(f(x) — f(e))dx = f°(a — x) f'(x)dx and L(F@) 
— f(O)dx = fP(b- 2) f(a ‘ der. 

(b) If S(a) = Vencn+n Cne(na) and the points a1,...,a@p are 6-spaced mod 1, 
prove that 


R ‘ : 
ret a) |?da a)S"(a)|da 
3 ister) <;/ 15(a)|2d +f 15(a)8"(a)|da. 


(c) Prove that the left-hand side of 25.11is < (tN + 67!)|lé||3. 
Exercise 25.6. Let p be an odd prime and let n, be the least quadratic non-residue 
mod p. 
(a) Prove that if Pt(m) < np, then — |p) = 1. 
(b) Deduce Vinogradov’s bound ny, < p!/2V°+°)), [Hint: For all 2 > 1, show that 
De Pha )<n, (Mm 2) < 01 /Pog ) Te ies Pets m)>Np 11] 
Exercise 25.7. Assume the Generalized Riemann Hypothesis. 
(a) Let y be a non-principal character mod g. In addition, let ¢ be a smooth 
function supported on , ' with Mellin transform ®. Show that 
S> A(n) o(n/ax) « «/? log q (a > 1). 
n21 
[Hint: Exercise [[1.4]] 
(b) Prove that n, < (log p)? for all p. 


(c) Use the Chinese Remainder Theorem to prove that if x is large enough, then 
there is a prime p € [x/2,2] such that (q|p) = 1 for all eget q < 0.49 log x. 
In particular, there is an infinite sequence of primes p with n, > 0.49 log p. 


Chapter 26 


The Bombieri- 
Vinogradov theorem 


Having developed the large sieve, we present here one of its most impor- 
tant applications: a proof of the celebrated Bombieri- Vinogradov theorem 
(Theorem [18.9]. Let us recall its statement: for each fixed A > 0 we have 


li(y) 


m0) Fo <4 Taga 


(26.1) max max 
ySx@ ae(Z/qZ)* 
qI<Q 


uniformly for x > 2 and1<Q< V/x/(logx)4*°. 


This result is often called “the Riemann Hypothesis on average”. Indeed, 
the Generalized Riemann Hypothesis implies that 


: a) — Bw xz log(qz 
™(y; 4, 4) AD < yal (qa) 


(26,2) max max 

y<x ac(Z/qZ)* 
for all g € N and all x > 2 (see Exercise[11.2). Conversely, Theorem [6.1] and 
Exercise[11.3(b) show that knowing for all x > 2 and all g € N implies 
the Generalized Riemann Hypothesis. Now, observe that implies 
with Q = /z/(logx)4+?, which is bigger only by a factor of log x 
than the largest value of Q furnished by the Bombieri- Vinogradov theorem. 
In comparison, note that the Siegel-Walfisz theorem (Theorem[12.1) provides 
a much poorer range of validity of (26.1), allowing us to establish it only for 
Q < (log x)°, where C is arbitrarily large but nevertheless fixed. 


We thus see that if we need to estimate a(x; g, a) —li(x)/y(q) on average 
over q, the Bombieri- Vinogradov is just as good as the unproven Generalized 
Riemann Hypothesis. And having access to such strong averaged estimates 
is of crucial importance in sieve theory (see the discussion in Example[18.8). 


277 


278 26. The Bombieri- Vinogradov theorem 


Preliminaries 


To prove the Bombieri-Vinogradov theorem, we will decompose von Man- 
goldt’s function into type I and type II functions using Vaughan’s identity. 
We will then examine the distribution of each type in arithmetic progressions 
employing different arguments. But first we must perform some preparatory 
steps. 


Using a more natural main term. Throughout this chapter, we will 
employ the notation 


Asasaa)= fn) - 5 Stn), 


Nx NKx 
n=a (mod q) (n,q)=1 
where f is an arithmetic function, z € Rsi, q € N and a € (Z/qZ)*. This 
quantity will be small if f is well-distributed among reduced arithmetic 
progressions. Moreover, it admits a convenient representation in terms of 
Dirichlet characters: we have 


(26.3) Asuia.a) = > xa) Yo f(n)x(n). 
i = 


Now, let 1p denote the indicator function of primes. We want to replace 
li(y) in (26.i) with Dean 1, so that we can express the left-hand side of 
(26.1) using the quantity A;, to which we can apply (26.3). We have 


S> 1 = xy) + O(log g) = lily) + O(ye V4 + log gq) 
PXy, Pia 


for y > 2 and q EN, where c is the constant from Theorem [8.1] (the Prime 
Number Theorem). This reduces (26.5) to proving that 


max max |Aj,(2;q,a)|< 
me ee ee 1p( 34, )| A 


uniformly for x > 2 and1<Q< V2/(logx)4*°. 


ee 
(log x)4¥1 


Switching to von Mangoldt’s function. The next step, in preparation 
for the application of Vaughan’s identity, is to switch from 1p to A. This is 
accomplished by the following estimate. 


Lemma 26.1. For x > 2,q EN anda é (Z/qZ)*, we have 


( max |Aq(y;q,a)| + va). 


max |A 5q,a)|< 
max | ip ly q; | Jt<yse 


log x 


Preliminaries 279 


Proof. If y < \/z, we have |Ai,(y;q,a)| < /z/logxz. Assume now that 
y € [/az, x]. Chebyshev’s estimate (Theorem 2.4) implies that 


WEE 
(26.4) Yo14>N1= 0 Vis d ri <=. 
pS k22 pea PSV 1<k< P82 <a 
Hence, if y € [,/z, 2], we have 
A(n) 1 A(n) Vx 
A1,(y;4,0) = = O 
1 (¥iG4) Ds logn y(q) Dy logn 7 log x 


Va<ncy 
n=a (mod q) (n,q)=1 


Employing partial summation, we find that 


Aip(y; 4,4) -[ ——dAa(t; q,a a) +0( 


2) 


Jz logt log x 
Ax(t; , ¥ An(t; 
= A(t; q, @) +f MEO) ay o( =). 
log t ifs Jz tlog*t log x 


Applying the triangle inequality and then bounding |A,q (t; q, a)| by its max- 
imum value over t € [\/z, x] completes the proof of the lemma. 


Lemma [26.1]reduces the Bombieri- Vinogradov theorem to showing that 


& 
26.5 Aaly; 4, <A 77 
_ Pepi clefaay OMB EOI SA Tog aya 
I<Q 
uniformly for x > 2 and1<Q< V/z/(logx)4*. 


Applying Vaughan’s identity. The key to proving (26.5) is a combina- 
torial decomposition of von Mangoldt’s function in terms of type I and type 
II functions. We perform this decomposition by appealing to Vaughan’s 
identity (Lemma 23.1): we have A = A’ + A’ + Acy for some U,V € [1,2] 
to be chosen later, where we recall that 
At = pey *log — (Acu *picv)*1 and A? = (Assy *1)* psy. 

As we will see, we may work with any choice of U and V satisfying the 
conditions 


(26.6) UV < Va and U,V >ev!®?. 


The contribution of the term Acy is bounded trivially: we simply note 
that )on<y A(n) < U, whence Aq. (y3q,a) <U < \/ for all g,a € N and 
all y < x. As a consequence, 


(26.7) a. eS Ania] K QVz < ae 


for 2 > 2and1<Q< V2/(logz)4t?. 


280 26. The Bombieri- Vinogradov theorem 


It remains to study the distribution of A* and A? in reduced arithmetic 
progressions. We start with the former. 


Type I functions in arithmetic progressions 


The study of type I functions in reduced arithmetic progressions is relatively 
easyl] We have the following general result, proven by a straightforward 
application of the simplest version of Dirichlet’s hyperbola method. 


Theorem 26.2. Let v > 0, and let f be an arithmetic function supported 
on [l,y]. Forx >2,qEN andaé (Z/qZ)*, we have 


JA frtog’ (#3 4, @)| < 2(log x)” 7 | f(k) 
ky 


Proof. Given k € (Z/qZ)*, we write k for its multiplicative inverse mod q. 
Notice that if n = a(modq), then (n,q) = 1. Therefore 


Ayan tina) = D> F0)(  Cowo’- > Y oes"). 


ky e<ax/k 
(k,q)=1 =ak (mod q) (4,q)=1 


The expression inside the parentheses equals Ajog»(2/k;q, ak). Hence, the 
theorem is reduced to proving that 


(26.8) |Atog (459, 7)| < 2ogt)” (> 1, 7 € (Z/qZ)”"). 


We first prove (26.8) when v = 0. We start by noticing that 


Aili) = Try > ( > 1— dS i). 
PY jre(@]azy Sst net 
n=j (mod q) n=j' (mod q) 


Now, fix 6 € Z. Since each string of g consecutive integers contains exactly 
one integer in the class b(modq), the number of n € ZN [1,¢] in the class 
b(mod q) is either |t/q| or |t/q| +1. We deduce that |Aj(t;q,7)| < 1, which 
proves (a stronger former of) when v = 0. 


Finally, when v > 0, we use partial summation to find that 


A 
Mean = / hogeaha Gens). 
al 


Integrating by parts and using the already proven fact that |Ai(s;q,7)| <1 
yields (26.8) in this case too, thus completing the proof of the theorem. 


1The distribution of functions of multiplicative nature can be significantly more complicated 
over non-reduced progressions (see Exercise[26.1). Of course, focusing on reduced progressions is 
sufficient for applications to the theory of prime numbers. 


Type IT functions in arithmetic progressions 281 


As an immediate corollary, we have an estimate for the distribution of 
A® in arithmetic progressions, which is the “structured” part of A. 


Corollary 26.3. For U,V >1, «>2,q¢€N anda€e (Z/qZ)*, we have 


A 3q,a)| < UV logz. 
eS a At (Yq, @)| og x 


The above result readily implies the estimate 


26.9 A 3q,a)|< QUV logz. 
(26.9) ee ee al GUY le 

qI<Q 
If Q < V2z/(logx)4+3 and UV < \/z (as we assumed in (26.6)), then the 
right-hand side of (26.9) is < Q,/x(logx) < x/(logx)4+?. Together with 
(26.7), this reduces the Bombieri-Vinogradov theorem to proving that 


Cy 
26.10 A av 
(26.10) 2 BEE, acltithy An(ea.a)l « Goya 
uniformly for x > 2 and1<Q< V2/(logx)4*°. 


Type II functions in arithmetic progressions 


The main result we will use to study A’ in arithmetic progressions to large 
moduli is Theorem [25.15] (the multiplicative large sieve inequality). It turns 
out that this result can only handle the contribution of Dirichlet characters 
of large conductor to the sum in (26.10). To deal with characters of small 
conductor, we need the following estimate. 


Theorem 26.4. Fiz A,C > 1. Ifx >3,U € [1,2], V € [ev'8*, a], r © Neg 
and y is a character of modulus q < (logx)°, then 


(n 
ax 2 A’( n)1 < 
pee a ea: rj= A,C 


(log x)4 


Proof. The result is proven by a modification of the second part of the 
proof of Theorem [24.3] First, we open the convolution A?(n) = Y>,,_,,(1 * 
Asu)(k)usy(). Then, we fix the congruence class of €(modq) and use 
Corollary [[3-4] We leave the details as an exercise. 


Remark 26.5. In Chapter we noticed how the bilinear methods are 
perfectly complemented by the Siegel-Walfisz theorem, thus yielding results 
such as Theorem [24.3]that cover all possible values of a. We see this comple- 
mentarity manifesting itself again in the proof of the Bombieri- Vinogradov 
theorem: bilinear methods will handle characters of large conductor, but we 
have to resort to the Siegel-Walfisz theorem (via an application of Corollary 
to handle characters of small conductor. 


282 26. The Bombieri- Vinogradov theorem 


Finally, as a more technical remark, we mention that it is possible to 
prove (a version of) for non-principal characters by a direct appeal 
to the Siegel-Walfisz theorem, thus circumventing the use of Corollary [13.4] 
Indeed, when (a,q) = 1, r € Nez and UV < x/eV!°8*, we claim that 


max |Aa»(y;4,4)|<a,c 2/(log x)", 
VE<KYKx 

where Av (n) = A’(n)line)=1- This estimate is good enough for the purpose 
of establishing the Bombieri- Vinogradov theorem. To prove it, we start by 
writing A? = A—A'—Acy. After multiplying this identity with the indicator 
function n + 1(m,r)=1, we apply the Siegel-Walfisz theorem (‘Theorem [12.1] 
to the first function on the right side and Theorem 26.2] to the second one. 
The details are left as an exercise. 


Reduction to character sums. We now show how to use Theorem 26.4] 
to reduce the proof of (26.10) to a certain large sieve estimate involving 
sums of A? twisted by Dirichlet characters. It is convenient to introduce 
some notation for these character sums: for the rest of this chapter, we let 


M*(n)x(n)Lnrmt 
n<y 


S,- (ax = max 
r( x) JE<yse 


By (26.3) with f = A’ and the triangle inequality, we find that 


1 
max max |A,,(x3q,a) “7 Si(a, x). 
Va<yxa ac (Z/qZ)* Avo (4) on 
sens 


Now, let €(modd) be the primitive character inducing y, so that d is 
. conductor of y. We then have that d|q and y(n) = linqj=1€(n) = 
(n,q/d)=16("), Where the second equality follows by noticing me ae copri- 
ne of n and d is encoded in the definition of €(n). Hence, 


1 * 
max max _ |A,,(2;q,a)| << —~ S° > Sqja(Z, €): 


Ja<y<a a€(Z/qZ)* (9) d\q, d>1, € (mod d) 


Fix C > 0 to be chosen later. When d < (logx)°, we use Theorem 26.4] 
with A+ 2C +1 in place of A (recall that we have assumed (26.6)). There 
are < (log x)?© pairs (d,€) with d < (logx)°. Consequently, 


Saja(,§) us 
max |A,,(x;q,a <> >. ; ad pn 
as i) 9(q) Care a) ) 
a€(Z/qZ)* d>(log x)° 


Type IT functions in arithmetic progressions 283 


Since y(q) > y(d)y(r) when g = dr, we infer that 


T (a, 
(26.11) max Ay(aia.a)l <2? + o(e/ (ogy), 
LYKxL 
12 ge(z/42)* r<Q 
where 
1 
Te Gk) = ay Sp(2, ) 
d 
aseayecacdye SY) efmoaa) 


A large sieve inequality. The next step is to bound T;,(z,Q) using the 
large sieve in the form of Theorem [25.15} However, this result establishes a 
bound for the mean square (also called the “second moment”) of character 
sums >) ,,<¢ CnX(m) when we average over primitive characters x of conductor 
< Q. To prove the ur Cem hae theorem, we need to estimate 
the first moment of 5> A? (n)x(n)1(n,p)=1- It turns out that the bilinear 
structure of A? allows us to use the Cauchy-Schwarz inequality and pass 
from the first moment of }7,<, A >(n)x(n)1anr)=1 to a product of two second 
moments of two other character sums. As a matter of fact, we have the 
following estimate for general type II functions. 


Nx 


Theorem 26.6. Let f and g be two arithmetic functions supported on {1, M] 
and (1, N], — For x,Q >1 we have 


Fe aay BE LLU rat 


a<Q, x (mod q) 
< soe VMQ + VNQ + Q?)(log x) || f lz Ilglle- 


Proof. Since we are only considering integers n < x, we may assume that 
xz > M,N. Indeed, if for example x > M, then we replace f by f+ 1j2)- 


Let S be the sum in the statement of the theorem, which we write 


—_ SoS sa) | > f(m )9(n)x(n)lmn<y| + 


a<Q, x (mod q) m<M,n<N 


We want to apply the Cauchy-Schwarz inequality to S' to separate the vari- 
ables m,n and pass to a product of two second moments, so that we can 
apply Theorem to each variable separately. However, there are two 
technical obstacles. First, the variables m and n are tangled in the indicator 
function lmn<y; second, we have to take the maximum over y < x. We take 
care of both of these issues simultaneously by an application of Perron’s 
inversion formula. 


As a preparatory step, note that mn < y if and only if mn < |y| + 1/2. 
Hence, in the definition of S we may replace max,<, by maxy—,+1/2,keN, k<a- 


284 26. The Bombieri- Vinogradov theorem 


Now, let y = k+1/2 for some integer k € [1, x]. Lemmaf[7.Iwith a = 1/logz, 
T = x” and mn/y in place of y implies that 


nae! lm a, 4 (elma). 


att J Rela x?| log(y/mn)| 


We have |y — mn| > 1/2 for all integers m,n, by our assumption on y. 
Therefore, | log(y/mn)| >> 1/y >> 1/a. Moreover, (y/mn)* < 1 for y < 
x+1/2 and m,n > 1. We thus conclude that 


x atit 
LE Ampxlmaln)x()tmnew =5e f, HOIGx) at 


m<M,n<N 
(26.12) + O(e! S> [Fm X> I9(~)1), 
m<M n<N 
where 


=F AMO t= 5 HOM 


m<M n<N 


The Cauchy-Schwarz inequality and our assumption that M,N < x imply 


that 
S> |f(m)| <a? |[fll2 and S© |g(n)| < «'/? [Igllo. 


m<M n<N 


In the main term of ( , we note that |y°t™| <1 for y < x+1/2, as 
well as that |a + it| x enn \t|}. Therefore, 


FE Fomxlm)o(nx() mney < f * OIG a 4 Fallalle 
m<M,n<N —x? max{a , |t|} 


The right-hand side no longer depends on y. Consequently, 


= if peed (| 1G) = + Glial: 


q<Q,x Goad © 
By the Cauchy-Schwarz inequality and two applications of Theorem [25.15} 
one where we take cn = f(n)/n**™ for n € [1,M], and another one with 
C= Gn oo for n € [1, N], we conclude that 


S> S~ —~ - |Fi(x)| - IGe(x) | & V(M + Q?)(N + Q?)||fllallglle- 
q<Q,x Gaede)” 
Since \/(M + Q?)(N + Q?) x VMN + VMQ4+ VNQ + Q? and 


2 


- dt dt dt 
ae =<. < log x, 
_22 max{a, |t|} It|<a @ a<|t|<x? lel 


the theorem has been established. 


Type IT functions in arithmetic progressions 285 


Corollary 26.7. For x,Q > 2, U,V € [1,2] andr EN, we have 


Qzr , Qa 
> Ss r(y,x) & (f+ + oe + Q?V/z ) (log x)°. 
qI<Q, x (modq) © = ' ) 


Proof. We begin by writing A’ using (23.12), which we also multiply with 
the indicator function of integers coprime to r: 


A? (n)linr)=1 = S> (a; * Bj)(n) for n<a, 
U<2i<2x/V 
where we have set aj(k) = (Asu * 1)(k)lo-1epcailceryar and 8;(0) = 
Msv(€)1ece/2i-11(¢r)=1- Therefore, Theorem with f = a;, M = 2, 
g = 8; and N = x/2)—' implies that 
qd * 
— max 
d, (4) yy ye 
q<Q x (mod q) nxy 
K (Va + 29°Q + V22-4Q + Q?)(log x)|Ia5ll2l| B;ll2 
K (w+ 2/7, /eQ + 22-9/7Q + VeQ?) (log x)’, 
where we bounded ||q;||2||3;||2 by O(,/z log x) using the inequalities |a;| < 


log and |@;| < 1. Summing the above estimate over 2 € [U,22/V] (there 
are < log such choices for 7) completes the proof of the corollary. 


The easiest way to pass from the above estimate to a bound for T,.(x; Q) 
is to use a dyadic decomposition trick: we have 


d 
HeM< YD ga Ye gy Le’ Seo. 


(log x)° <23<2Q/r 2I-1<d<2s € (mod d) 


Corollary then implies that 


T,(%,Q) < SS : = (24 “242 WW ~ 443 V2) (loge)? 


ieee ae 
x x(logr)* ax(logx)* Qy/x(log x)? 
26.13 Ket st ES te 
( ) (log z)¢-3 VU VV r 


where we used the estimates 


sy 2-5 < (logx)~©, » 1 < logz, y << Q/r. 


25 >(log x)? 1<23<2Q/r 1<23<2Q/r 
Inserting (26.13) into (26.11) and executing the summation over r yields 
£ x(log a)? 


max |Ajs(y3q,a)| << + QV/z(log x)?. 
< Va<y< x ( 
gS <Q od (2 /y2)* 


+ 
logr)°-4 — ,/min{U, V} 


286 26. The Bombieri- Vinogradov theorem 


We take C = A+4 and U = V = ev'°8* (which satisfy (26.6), at least when 
x is large enough). This completes the proof of (26.10), and hence of the 
Bombieri- Vinogradov theorem. 


Exercises 


Exercise 26.1. Let q€& N and aeé€ Z. 
(a) If (a,q) = 1, prove that A,(x;q,a) < ./x for x > 1, as well as 


S- Habis Octo (x — 00). 
nxn q 
n=a (mod q) 


(b) Let d= (a,q), r= q/d and c= 7(4) |] i14, pp (1 — 1/[(p — 1) (vp(d) + 1)]), where 
vp(d) denotes the p-adic valuation of d. Prove that 


x r(n)~ e+ SD) toga (a — ov). 


nKx 
n=a (mod q) 


[Hint: For each Dirichlet character y(modq), evaluate the partial sums of 
m — x(m)7(dm)/r(d) by appealing to Theorem [3.2]| 


Exercise 26.2. Fix k € N and A > 0. Show there is B = B(A,k) such that 


li(y) 
Te(q) max max  |7(y;q,a) —- —<| <ka 
X wt eo ae (Z/qZ)* (Ha) aa| ~"* ( 


forz > 2and1<Q< V2/(logzx)?. [Hint: Use the Brun-Titchmarsch and the 
Cauchy-Schwarz inequalities to remove the weight 7, (q).] 


log x)A 


Exercise 26.37 Let Q,4 p denote the set of integers g > 3 such that 


er li(z) li(a) 
ac(Z/q2)* ™(23 4,4) y(q)| ~ 9(q)(log x)4 


For each A > 0, show that there is B = B(A) such that 
#248 1[1,Q] = Q + O4(Q/(log Q)*). 


Exercise 26.4* Fix A > 1 and « > 0. Using the combinatorial decomposition 
in place of Vaughan’s identity, show that holds uniformly for x > 2 
and 1 <Q < \V/z/(logx)4+?+*. [Hint: In the proof of the analogue of Theorem 
[26.7] with a;, 8; defined appropriately, show that ||a;||2||5;|]2 << vlog z/ log y.] 


for all « > q?(logq)®. 


Exercise 26.5% Prove the Bombieri-Vinogradov theorem by decomposing von 
Mangoldt’s function using Heath-Brown’s identity from Exercise 23.5] 


Chapter 27 


The least prime in an 
arithmetic progression 


We have proved that all reduced arithmetic progressions of a given mod- 
ulus q get their fair share of primes. Since this is an asymptotic result, a 
fundamental question is how far do we have to go to see the primes becoming 
equidistributed among the different progressions mod qg. A simpler version 
of this question is how far do we have to go to locate the first prime p in 
the reduced residue class a (mod q). We denote this prime by P(q, a). 

The Siegel-Walfisz theorem tells us that (x; q,a) ~ li(x)/p(q) as soon 
as © >- exp{q*}, so that P(q,a) <- exp{g®}. However, we expect that 
m(x;q,a) ~ li(a)/y(q) in the much wider range x > q!* (see Exercise[I7.6), 
which would imply that P(q,a) <- q't*. If we knew that Digaeag L(s,x) 
has no zeroes in the strip Re(s) > 1 — 6, we jaa immediately deduce 
that P(q,a) <, q'/5+® (for instance, see Exercise[I1.2). Remarkably, Linnik 
proved that such a strong bound on P(q,a) holds uncoddifienally: 


Theorem 27.1 (Linnik). There is ig absolute and effectively computable 
constant L > 1 such that P(q,a) <q” for allq >3 andaé€ (Z/qZ)*. 


Linnik’s original proof relies on three ingredients: 
1) the classical zero-free region given in Theorem {12.3} 


2) a log-free zero-density estimate which, among others, implies that, for 
each fixed C > 0, the product [J], (moa) £(8, xX) has O(1) zeroes in the 
region {s€C:a021- C/log(qT), lat 

3) the Deuring-Heilbronn phenomenon, stating that the classical zero-free 
region can be enlarged when it contains an exceptional zero. 


287 


288 27. The least prime in an arithmetic progression 


A proof along these lines is presented in and [114]. We will present 
here an alternative proof. The three above ingredients are replaced by: 
1’) the results of Chapter [22]that build on sieve methods and the theory of 
pretentious multiplicative functions; 


2’) the pretentious large sieve, as developed by Granville, Harper and Soun- 
dararajan building on ideas of Haldsz and Elliott [40]; 


3’) an argument of Friedlander-Iwaniec Chapter 24] that allows us to 
count primes when there is an exceptional zero using a zero-dimensional 
SIEVE. 


The pretentious large sieve 


Consider the bilinear form that has coefficients ayn = x(n)/¢/~(q), where 
x runs over all Dirichlet characters mod g and n € ZN [1, N]. In Example 
25.8] we proved that this bilinear form has norm < N + q. Using Theorem 
25.9] we deduce the bound 


(27.1) | Cenxtmf’ < uv + eB 


x(modq) n<N 7 


for all @€ CX. If |e,| < 1 and N > q, the right-hand side is < N?y(q)/q. 
This bound could be as big as one term on the left-hand side if, say, 
Cn = X(n). However, we should expect that the sequence (cn)N_, can cor- 
relate strongly with only a few Dirichlet characters by the approximate or- 
thogonality of the latter. Hence, if v1, ..., x, are the characters correlating 
the most with the sequence (cn)n<n, it is reasonable to guess that 


> | S> enx(n) 


XALX1 Xr f n<N 


; 2 

| = o(N%p(a)/a). 

The pretentious large sieve proves such an estimate when c, = f(n) with f 
multiplicative and bounded. Rather than presenting it in full generality, we 


develop it in a rather special case that sidesteps many of the technicalities 
of the general case (see Lemma [27.6{b) and Remark 27.7] below). 


To simplify various details, we count the primes with a logarithmic 
weight. As we will see shortly, this move allows us to bypass the use of 
Perron’s inversion formula and instead relate prime sums to L-functions 
using the simple idea behind the proof of Lemma [22.3] 


By orthogonality, we have that 


1 1 
oS 
Y<PSz x (mod q) Y<PSz 

p=a (mod q) 


The pretentious large sieve 289 


We expect the principal contribution to come from the character x = xo. 
However, in Chapters [12] and we saw that there is potentially an addi- 
tional exceptional character whose contribution we cannot control. If we 
exclude these two characters, we can show that the total contribution of the 
remaining characters is small. 


Theorem 27.2. Let q > 3. There is a real, non-principal Dirichlet charac- 
ter x; (modq) such that for all z > y > q? and all a € (Z/qZ)* we have 


> * = _7_( 3 1+ xl?) + 9), 


Y<PSz P (9) Y<PpSz 

p=a (mod q) 
Remark 27.3. In fact, if R, denotes the set of real, non-principal Dirichlet 
characters mod q, we will take x; such that Lg(1,x1) = minyer, Lg(1, x). 
This choice is motivated by Theorem b). 


The first step in the proof of Theorem 27.2] is to exploit the type I/II 
structure of von Mangoldt’s function and pass to a second moment estimate 
to which we can use the method of bilinear forms, much like we did in 
the proof of the Bombieri-Vinogradov theorem. Due to the presence of 
logarithmic weights, it suffices to use with D = 1: we have 


(27.2) A(n)1p-(nysy = Nueva”) + Mieve(D), 


Mi vel) =1p-(n)sylogn and Mive(n)= S >>> ulk)logé. 
kl=n,k>y,l>y 
P~(k)>y 


If x is a Dirichlet character and we let Ly(s, x) be defined as in Chapter 
(see (22.2)), then the Dirichlet series of A’,...x factors as 


a y n n 
(27.3) S- Nsieve(™)X(M) _ Li(s,x)(Ly"(s,x) — 1). 


ns 


n=1 
Using the above observations and Lemma [22.3] we prove the following 


preliminary result. 


Lemma 27.4. Let x > y > q°. In addition, let x1 be any real non-principal 
character mod q, and set Cz = { x (modq) : x £ x0, x1} and 


141/logy ; a 
Sq = -{ ) X(a)L,(o, x) eas (7, x) — 1)do. 
1+1/ log z xeCq 


S- = =—_( S- txt) 4s, 40(0)). 


Y<pKz P e(q) Y<pKz 
p=a (mod q) 


Then 


290 27. The least prime in an arithmetic progression 


Proof. We may assume that g > 10. Set 


1 1 
(27.4) 6(n) = ln=a(modg) — me) S> X(a)x(n) = a X(@)x(n), 
x€E{xo.x1} xXECq 
so that our goal is to estimate the sum 
yo ey 
Y<pKz P (9) Y<PKz P Y<DKz P 
p=a(modq) 


First of all, with the notational convention that A(1)/log(1) = 0, we 
claim that 


il A(n)/logn 
ES t= YA on/@)  (w>). 
D ni+1/log 
ia ed P-(n)>y 
p=a (mod q) n=a (mod q) 


Indeed, this follows from a simple adaptation of the proof of Lemma [22.3 
The two needed estimates are 


1 1 logp | logw 
< and < 
»» pert eee 2 p y(q) 


p=a (mod q) p=a (mod q) 


for w > y, which are both corollaries of the Brun-Titchmarsch inequality 
(Theorem 20.1) and partial summation, since we have assumed that y > q’. 


In addition, for character y (mod q), Lemma [22.3] implies that 


x(p) x(n logn 
eae XC T8 FO) (wy). 


y<prw ~(n)>y 
We thus find that 
d(p) _ 6(n)A(n) 
(27.5) pa = s niti/l0ew Jog n + O(1/y(q)) (wey). 
y<pow P-(n)>y 
Next, we rewrite A(n)1p-(,)sy using (27.2) and show that the contri- 
or. of A%. to the right side of (27.5) is negligible. Indeed, Theorem 


sieve 


a) and our assumption that y > q? imply that 


x [,<,(1-1 gi-1/lo 
2, ve aan ~ O( Zaytogy) 2H be Za). 


n<a,P~(n)>y 
n=b (mod q) 


Note that 6 is a g-periodic function supported on integers coprime to q. In 
addition, >) nez/qz (m) = 0 and >) ,6z/9z,|6(n)| < 3. As a consequence, 


yi 1/logy 


5(n) Az ae 
De a = on ~ p(q) logy 


Ngx l<n<a 


The pretentious large sieve 291 


for x > y. Together with partial summation, this implies that 


© 5(n) At 1 
S° a sieve (n) < 
<~ n?logn 9(q) 


Combining the above estimate with (27.5) and (27.2), we conclude that 


CA? ae = 57 iieve(™) Oj) — (w > 9). 


iS = mitl/logw logn 


(ae 1), 


We want to express the right-hand side of (27.6) as an integral. We do 
this using a trick: we may trivially arrange the summation as ay = 


y<p<z ~ Luy<p<y: Hence, applying (27.6) with w € {y, z} implies that 


oo i D. n n D. n 
3 ae. => 5( )Acival ) S> 5 )Asievel ) { O(1/~(q)) 


yyit1/ log z logn ni+1/logy log n 


Y<pSz n=1 P-(n)>y 
141/logy & Ae. n)d(n 
1+1/logz 4 n 


by the Fundamental Theorem of Integral Calculus. Using to rewrite 
d(n) in the integrand in terms of characters y € Cy, and then employing 
(27.3) to factor the Dirichlet series of A’....y completes the proof. 


sieve 


The next natural step is to apply the Cauchy-Schwarz inequality to the 
sum S, of Lemma[27.4]and use bounds like (27.1). But first, we exploit the 
fact that we are summing over the restricted set of characters Cy. 


Lemma 27.5. Let Cg = {x (modq) : x 4 xo, x1} with x1 defined as in 
Remark 203) Then |L,(o,x)| <1 for ally €Cg, y>q ando > 1. 


Proof. If x is a complex character, then |LZ,(1, x)| = 1 by Theorem 22.6{a). 
On the other hand, if y € Cy is real, then Ly(1, x) > Lg(1, x1) by the choice 
of x1, and thus L,(1,7) = 1 by Theorem [22.6{b). In all cases, we have 
|Zq(1, x)| = 1. Combining this fact with Theorem (applied with y = q 
and t = 0) yields that >7,-,<, X(p)/p = O(1) for all v > u 2 q. Finally, if 
we insert this estimate into Lemma [22.3] we infer that |L,(o,x)| < 1 for all 
y >q and all o > 1, as needed. 


By the above lemma, we have Brea x)-1<« 1 for all x € Cy. However, 
we do not want to remove the factor Ly '(g,y) —1 completely from the sum 
S, of Lemma 27.4] because this will destroy its bilinear structure. Instead, 


we note that Le (3X) = F,(s, x), where 


1/2 ; ute 
Asx) = TT (1-22) =; LA) 
P 


8 
p>y P —(n)>y 


292 27. The least prime in an arithmetic progression 


with 7, being defined by (13.3). Since |z? —1| < |z—1| for |z| < 1, we have 


1+1/logy 
s,< | D2 1L,(9, x) LFy(o.x) — Udo 
1+1/logz vel 


1+1/logy ; 4 9 1/2 
<f (Sie? O lve.) -1P) "a 
141/ log z xECa xECa 


Noticing that |tT_1/2| < 71/2, Theorem follows from the following esti- 
mate. 


Lemma 27.6. Letxk >0,y>q?>>1 and1<a<1+1/logy. 
(a) Let f be an arithmetic function with |f| <7. We have 


S- s- 2a : 1 


x (mod gq)’ 2>1 [(o ~~ 1)(log wy 


(b) In addition, we have 


Kx 


dD Ey, x)? « (log y)?. 
x (mod q) 
X#XO 
Remark 27.7. Part (b) exemplifies the idea of the pretentious large sieve: 
we exclude the principal character from the summation because we know 
that Li,(¢,x0) + co when o — 1+, whereas we know that we can control 
the size of Li,(c, x) for x # Xo using ideas from Chapter 


Proof. (a) Let fy(m) = lp-(m)syf(m). By the orthogonality of Dirichlet 
characters, we have 


(27.7) S- s. fulndx(n) 


x (mod q) ' n>1 


> ee ys 
=e 

ni>l n2>1 by) 

(n1,q)=1 n2=n (mod q) 


Since f, is supported on integers free of primes < y, we may assume that 
the above sum runs over integers n1,n2 > y. 


For a € (Z/qZ)* and x > y > q?, Theorem and our assumption 
that |f| <7, with « > 0 yield the bound 


x | fy(n n) < Zexpf 2 oe who q) (log x)!~ K(log y)* 


N<Kx Pa a rere 
n=a (mod q) ota 


Using this estimate and partial summation, we find that 
n 1 
eo Melnall 
poe n3 y(a)[(o — 1) logy] 


n2=a (mod q) 


The pretentious large sieve 293 


uniformly for a € (Z/qZ)* and o € (1,1+4+1/logy]. Together with (27.7), 
this completes the proof of part (a). 


(b) To prove the second part of the lemma, we combine the proof of part 
(a) and of Theorem We begin by splitting Li,(c, x) as 


-y 5) ae 
igs & 


P~(n)>y 


Fix j for the moment and let vj be as in Theorem with D = y)/3 and 
P = {p< y}. Moreover, set 


6j;(n) = (At *1)(n)—1p-jm)sy, sothat 0< 46; < (AP —Az)*1. 
Arguing as in the proof of (22.8), we find that 
So Mah + Wlaptown 
ne 


ys <n<yitl 


3 dF (d)x(d) 3 x(m) log(dm) 
a” m? 
d<yi/3 ys /d<m<yItt/d 

qlog(y’) — jglogy — jlogq 
< SS = (9 /d)? = y25/3 -< q2i-l 


d<yi/3 
for y > q? and y # yo. Consequently, 
x(n)6j(m) log n 
“Lilo SOU +>) Ds = 
j2l yi <n<yi+1 


We then apply the Cauchy-Schwarz inequality twice to find that 


IEy(a.x)P? < O(/a) +2] +7 Sy Ue 
jel pope 
(27.8) <= “+i | > —— 


jel yi <n<yith 
Summing over all non-principal characters y (mod q), we infer the bound 


S meurereSr 5 | ye znsebene 


x (mod q) J=1 — x(modq) yi<n<yi+t 
X#XO 


(27.9) = 1450 77S). 
j=1 


294 27. The least prime in an arithmetic progression 


Using the orthogonality of Dirichlet characters as in (27.7) and the inequality 
0 < d;(n) logn « 6;(n) log(y’) for n < y’*1, we deduce that 


2 d;(m1) > 6j(m2)_ 


oO oO 


55 < o(q)7? (logy)? 


: : Ny . : N95 
ys <nisyitt yi <ng<yitt 
(n1,q)=1 n2=n1 (mod q) 
Now, since 0 < 6; < Ap = rio Theorem implies that 
x res 
0< YO Gln) < YY AF-A)@-(4+00) « —_ 
nx dcyil? dq e(q) logy 
n=a (mod q) (d,q)=1 


for x > y’ and a € (Z/qZ)*, where we used our assumption that y > q’. 


Together with partial summation, this yields the bound S; < jte-7) (log y)?. 


Inserting this estimate into (27.9) completes the proof of the lemma. 


Endgame 


Having shown Theorem 27.2] we now pass to the proof of Linnik’s theorem. 
Let us recall that we must prove that P(q,a) < q¢" if L is large enough. We 
may assume that g > 10. Throughout, x1 is as in Theorem 


The two following cases are easy to handle. 


Case 1: y1(a) = —1. Then, Theorem 27.2] and Corollary 22.4] imply that 


S> ~=——( S- 110) + 0(1)) = =a ( S> ~~ 0(1)) 
ey OP VN, 2 AG) Noe 

p=a (mod q) 

for all z > y > q?. We take z = q’ and y = q’ to find that the right-hand 
side of the above inequality is > (log L — O(1))/p(q). Hence, if L is large 
enough, we immediately deduce that P(q,a) < z= q’. 

9-99 


Case 2: y1(a) = 1 and L,(1, x1) > L~°%. In this case, we let y = q and 
z=q'. Theorem then implies that eee x1(p)/p = O(1). Together 
with Theorem [27.2] this yields the estimate 


1 1 1 log i+ O0(1 
Se PO) me 
Y<PKz P Pd pepe Pq 
p=a (mod q) 


If L is large enough, the above expression is positive. Hence, P(a,q) < z= 


q”, as needed. 


The last case remaining is thus: 


Case 3: yi (a) = 1 and L,(1,x1) < L~°%°. Under the second assumption, 
Exercise implies that L(s, x1) has a zero 6; > 1— O(L~°-°°/log q). We 


Endgame 295 


do not need this fact, but it is useful to keep it in mind. We will use sieve 
methods to detect primes in the arithmetic progression a (modq). The key 
observation is that if n is square-free, then (1 * x1)(n) = 0, unless n has no 
prime factors p with yi(p) = —1. Now, implies that y1(p) = —1 for 
the vast majority of primes p € [q,q!/"s('x))]. This means that weighing 
n with the function (1 * x1)(n) presieves it with most of the large primes. 
Hence, if we let 


S(x,y; q,4) = > (1 * x1)(n) 
l<n<a,P~(n)>y 
n=a (mod q) 


and we choose x and y appropriately in terms of g, we expect that 


(27.10) YS (+x) = Sz, Va 4,4) © S(x,y; 4,4) 


VE<psa 
p=a (mod q) 


with y much smaller than «. On the other hand, the Fundamental Lemma 
of Sieve Theory is very efficient at estimating S(x,y; q,a@) for small y, as the 
following result demonstrates. 


Proposition 27.8. Assume the above notation. In addition, let q > 10, 
be (Z/qZ)*, x > q, y € [¢, 2] and u=loga/logy. Then 


S(x,y3q,b) = (1+ x1(b) + O(a71/osyy) Phu x1) 1- 1 
(4) II ) 


Proof. Note that S(x,y; q,b) = S(A,P), where A = (an)°°, with 


an = (1 * x1)() : li<a, n=b (mod q) 
andP={p<y:pt{q}. We will apply Theorem[I8.11] We must first check 
Axioms 

Fix, for the moment, d|P such that d < \/x. We have 


Aqg= > (1* x1)(m) = > x(k). 
n<a, d\n kl<a, d|ké 
n=b (mod q) kl=b (mod q) 
Notice that y1(@) = y1(b)x1(k) in the above sum. We then split Ag into 
three subsums: in the first one k < @, in the second one ¢ < k, and in the 
third one k = £. Since xi(¢) = v1(b)x1(k), the second subsum equals y1(b) 
times the first subsum. Moreover, the third subsum is O(,/2), since it has 
< \/x terms all of magnitude < 1. Hence, 


Aa = (1+x1()) > x(k) S 1+O(v2), 
k<Va k<é<a/k, C=kb (mod q) 
€=0 (mod d/(d,k)) 


296 27. The least prime in an arithmetic progression 


where k denotes the multiplicative inverse of k (modq). Since d|P, we have 
(d,q) = 1. Hence, the sum over @ equals (x/k — k)/(dq/(d,k)) + O(1). We 
thus infer that 


Ag= (+10) rath) A (a4) + O(a). 


k<Jfa 
Using the identity (d,k) = Yomjaxn P(m), and letting k = mr, we find that 


Ag = BOS cman) val) = mr) + 0(va), 
m|d r<Jfa/m 


The Poélya-Vinogradov inequality (Theorem [10.6) and partial summation 
yield that 


a 
> xi(r)mr < /zqlogq and > x1(r) ee < ./xq log q. 


r<J/z/m r>/z/m 
Consequently, 
_ A+ x1(0))2£L(1, x1) yr elm) xi(m) | 
Ag oF s am + O(,/zq log q). 


We thus conclude that the pair (A,P) satisfies Axiom []] with X = (1+ 
x1(6))@L(1, x1)/9, v(d) = dona P(™)x1(m)/m and rg = O(,/zqlog gq). Ax- 
iom PB] also holds with « = 2. Finally, since xL(1,y1)/q > x/(q°/? log? q) 
by Theorem and we have assumed that x > q!°° and y > 10, we have 
dlp, d<xt/100 |Tal < a-'/le8¥7(1, y1)/(qlog? y). Noticing that 1—v(p)/p = 
(1—y1(p)/p)(1—1/p) for p ¢ q, the lemma follows from Theorem[I8.11{a). 


We also need a stronger version of the first part of Theorem [22.5 


Lemma 27.9. Let y(modq) be a real, non-principal character. Then 


> 1+x(p) » log z eg ied Ye ge ye eco wy. 
Pp logQ 


Y<pKz 


Proof. We may assume that g > 10, as well as that y > q := q'°, since 
een 1/p = O(1). The non-negativity of 1 * x implies that 


1+ x(p) (1 * x)(n) 
= ——— 
(27.11) oa 7 S- 
Y<pKz Y<nkz 
P-(n)>a 
If we let a = Lg, (1, x) [Ip<q, (1 — 1/p) = 1/log Q, we have 
gi-l/loga 
21:12 1 = —_———. 
(27.12) » (1 * x)(n) ar +0( eee ) 


n<«,P~(n)>q1 


Endgame 297 


for x € [y,z]. Indeed, this follows by breaking the summation according to 
the congruence class of n (mod q) and by applying Proposition 27.8] to each 
subsum with x in place of x1, while noticing that Ly,(1,x) < 1 from The- 
orem [22.1] Inserting into (27.11) via partial summation completes 
the proof. 


Proof of Theorem in Case 3. Our starting point is the first equal- 
ity in (2710). We take x = g!”” andy = g°!"8", Since 1/Lq(1, x1) > L°”, 
Lemma 27.9] implies that 


1+ 1 
(27.13) > xi(P) Fr 
y<pu P L 
that is to say, y1(p) = —1 for most p € (y,z]. We then use a variation of 


Buchstab’s identity (19.12) to find that 
S(x,/x;4,4) = S(x,y;q,a)- > S> (1* x1)(n) 


Y<pSVz n<x, P~ (n)=p 
n=a (mod q) 


= S(z,y;q,a)- SS” (1*x1)(v™)S(a/p™, D9, Ba), 
m1, y<p</«z 
where b denotes the inverse of b(modq). When m > 2, we use the trivial 


bound S(2/p™,p;q,p'a) < Lin<a/pm T(n) K a(logx)/p™. For the sum- 
mands with m = 1, we note that (1 * x1)(p) = 2- 1,,(p)=1- Consequently, 


S(z, /a;9,4) = S(2,y;4¢,a)-2 >> S(e/p.pi4.pa) + 0(* 28"), 


Y<DS Va 
x1(p)=1 


Assuming that L is large enough, we may apply Proposition27.8]to all terms 
of the right side. This yields the estimate 


seviona (et & fous) 3) 


Y<PSVx PLy 
x1(p)=1 


+ O(ax(log x) /y). 


Employing (27.13) and the lower bound L(1, x) > 1/(,/qlog? q) from The- 
orem [12.8] we conclude that 


S(a,/2;q,0) = {2+0(L-1? + o-¥/2y1 Phu (Ls Xa) II (1 _ *). 


eM se, P 


If we take L and x large enough, we have S(a,./xz;q,a) > 0. This completes 
the proof of Theorem 27.1] in Case 3 too. 


298 27. The least prime in an arithmetic progression 


Exercises 


Exercise 27.17 Assume the set-up of Lemma a) and fix r € Ryo. 
(a) For all T € R, prove that 


3 i. > f(n )(log n)" x(n ) Minx (g—1)'-*" 


ot+it ’ = 2K 
oon a n (7 — (loz yw) 
P° (n)>y 


[Hint: First, reduce to the case T = —1/2. Then, for any smooth function 
g: R— Ryo majorizing 1|_-1/2,1/2], show that the above sum is 
Tn (M1) Tx (2) (log n1)" (log n2)" [glog(m1/n2))| 


(nin2)? 


< 9(¢) 


n1,N2g>1, P~ (nin2)>y 
n1=Nn2 (mod q) 


To estimate this new sum, show that g(€) < 1/(1+ |€|"***?), and then split 
the range of n1, nz into intervals of the form (e’, e/*+].] 
(b) Deduce that 


f(n)x(n)(logn)”|° dt (o — 1)1-?" 
ee 1+P <"" [oo 1)loen) 


x (mod q)~ — n>1 
Po (n)>y 


Exercise 27.27 For each gq > 3, show that there is a real, non-principal Dirich- 
let character 1 (modq) such that for any fixed smooth and compactly supported 
function g : Ryo — R and any fixed e > 0 we have 


ee a x1 (a) 
2s Nndotn/a) = or | aor 9(q) 


n=a (mod q) 


do x(n)A(n)g(n/a) 


no>1 


+0y,<(xué'**/p(q)) 
uniformly for x = q“ with u > 1. [Hint: The case u < 100 is trivial. For the case 
when u > 100, decompose A une (23.14) with D = 1 and y = max{q’, (log x)°°}. 
To control the contribution of Nieve’ use the Fundamental Lemma of Sieve Theory. 
To control the contribution of A, use Mellin inversion to find that 


sieve? 


(8, x))Ly(s, x) | 
S- | ee Sieeel a(n/2)} Ky zt See = 4. £100 dt, 


xECg nel xECg ter 


where Cy = {x (modq) : x # xX0,x1 } anda=1+41/logz. After applying Cauchy- 
Schwarz, the integrals must be split into two ranges: when |t| < y, the bound 
Ly'(s,x) < 1, Exercise and a suitable adaptation of Lemma 2Z6{b) can be 
used. When |t| > y > (log )!°°, Exercise 27.1] suffices. | 


Part 6 


Local aspects of the 
distribution of primes 


Chapter 28 


Small gaps between 
primes 


The Prime Number Theorem establishes important global aspects of 
the distribution of primes but it does not reveal much about their statistical 
properties at a microscopic scale. We dedicate this last part of the book to 
the study of the local behavior of the sequence of primes. Firstly, we study 
how close successive members of this sequence can get. 


There are ~ x/logx primes < x, so the average gap between them is 
~ log x. On the other hand, the twin prime conjecture predicts that the gap 
equals 2 infinitely often. Given that this conjecture is out of reach, we set a 
more modest goal: if pj < po < pg <--- are the primes in increasing order, 
we want to show that lim infp_,.5(Pn+41 — Pn) < oo. Note that if this is true, 
then we immediately deduce the existence of some s € N such that there 
are infinitely many primes p with p+ 2s also being prime. Hence, there is 
at least one even number satisfying Polignac’s conjecture. 


Remarkably, an even stronger result can be proved. 


Theorem 28.1. For each m €N, we have 


lim inf (ppm — Pn) < em’. 
n> co 


The case m = 1 of Theorem is due to Zhang [188], whereas the 
case m > 1 was proven independently by Maynard and Tao [17]. 
Granville’s article Section 1.4] contains an extended account of the fas- 
cinating developments that led to this major breakthrough. 


The main goal of this chapter is to give a proof of Theorem [28.1 


300 


The GPY sieve 301 


The GPY sieve 


The basic strategy to detect small gaps between primes is due to Goldston, 
Pintz and Yildirim. They used their method, now called the GPY sieve 
after them, to prove that the normalized gap (pn+1 — Pnr)/logn becomes 
arbitrarily small infinitely often. The main idea is to find weights w, > 0 
such that 
(28.1) > wn ( b> Lp(n + 8) —m) >0 

N<n<2N l<s<H 
for H that is as small as possible, where 1p denotes the indicator function 
of the set of primes as usually. Indeed, if this is the case, then there must 
exist some n € [N,2N] for which at least m+ 1 of the “shifts” n+1,..., 
n+ H are primes. 


To conceptualize the above task, it is helpful to assume a more proba- 
bilistic point of view. The weights w,, naturally induce a probability measure 
on ZM[N,2N] via the relation 


W 
Pin,2n](™) = eer 
NUN 


In this notation, (28.1}) becomes 


(28.2) Evenc2n | S> Ip(n+ s)| >m. 


1l<s<H 


Hence, our goal is to find a probability measure on Z 1M [N,2N] that is 
sufficiently concentrated on integers n for which many of the shifts n + 1, 
nm+2,...,n+4 are primes. 

As we saw in the discussion of Cramér’s model in the end of Chapter [17 
the numbers n+1,n+2,...,n+H have strong multiplicative dependencies 
stemming from their reduction modulo small primes. To this end, we con- 
sider integers 1 < sy < sg <--: < s, < H forming an aduatesitslld k-tuple 
(s1,.--, 8%) and aim to show that 


(28.3) Evenc2n | > Ip(n+ s;)| > ™. 


1<j<k 


The weights w,, must be chosen in a way that achieves simultaneously 
two things: (i) they correlate strongly enough with the indicator function of 
the event that many of the shifts n+ 51,...,2 +, are prime; (ii) they allow 
the estimation of the left-hand side of unconditionally. Condition (i) 
rules out choices such as wp, = 1, and condition (ii) rules out choices such 
as Wn = 4 lp(n + s;). Instead, we use sieve theory to “interpolate” 
between these two extremal examples. 


1Recall that this means that, for each prime p, the reductions s; (mod p) do not cover Z/pZ. 


302 28. Small gaps between primes 


The Maynard-Tao weights 


The original choice of w, by Goldston, Pintz and Yildirim was to consider 
the Selberg-type sieve weights 


kb 
(28.4) wGPY = ( mM NG) )) with a tt n+ s;) 


d|Q(n 


and 4 an arithmetic function to be determined. However, Maynard and Tao 
discovered that it is much more efficient to work with a multidimensional 
version of the above weights: given \ : N* > R, they defined 


(28.5) al \~ Mdiy-.-sdx)) - 


d;|n+s; Vj 


For both of the above choices, the left-hand side of (28.3) can be computed 
under rather general assumptions on A. As in the study of Selberg’s sieve, 
the goal is then to optimize the choice of A. 


Various technical details are simplified if we “presieve” the support of 
the weights wy, with all primes < y. There are two main ways of accom- 
plishing this. The first one is to restrict the support of w, to integers 
n = a(mod P(y)) for some slowly growing y and an appropriate congruence 
class a (mod P(y)). This is the approach taken in [138\[170]. The second one, 
which we opt for here, is to modify slightly the weights of by applying 
what is called a “preliminary sieve”. By this we mean that the small prime 
factors of Q(n) will be handled separately, using a simpler sieve. 


To define the weights w, we will use, we introduce the parameters 


D = NV4e-vilcg N y = exp{ (log log N)*}, Y = exp{(log log N)*}. 


We then set 

(28.6) wn=(X tm))( SO Ad...) 
m|Q(n) d;|n+s; Vj 

where: 


e i" is the sieve weight] \* constructed in Theorem with k =k, 
P ={p<y} and u = loglogN. In particular, |u*| <1 and p* is 
supported on {d < Y : d|P(y) }. 


e \: N* > Risa uniformly bounded function supported on 


Q:={(di,...,dy) EN*® : di--de SD, P(dj)>y (L<j<h)}. 


?We use the letter + instead of A+ to avoid confusion with the function ). 


Calculations 303 


Remark 28.2. What is important in the definition of the parameters D, Y, y 
is that y and NADY =) are both bigger than any fixed power of log N, 
D grows polynomially in N and log Y/ logy is larger than mints by a 
factor going to infinity. A good exercise is to check that any such choice of 
D,Y,y is sufficient for the proof of Theorem 28.1] to go through. 


Calculations 


Assuming that the weights w,, are given by (28.6), our task is to estimate 
the quantity 


ye ae Wp—se¢ 


(28.7) En<n<2Nn|lp(n + 8¢)] = 

" din<nc2n Wn 
for each £=1,..., k. All implicit constants in this section might depend on 
k, the choice of the k-tuple (s1,...,5,) and the supremum norm B := ||A||.. 


We will also make use of the following notation: 
= = = v(p) 
v(d) = #{n€ Z/dZ:Q(m)=0(modd)} and =V=]T oars 
psy 


Lemma 28.3. Assume the above set-up and define 


E(@igai2 03) = A(ayM1,- +) GRIM) 


1 


(m1,...;N~K)ED 


For any fired A > 0, we have 


Ss) w=VN SO Elen)" + Og /(log NY) 


ay eee a 
N<n<2N (a1,...,€b)ED 


Proof. For brevity, we write d to denote the k-tuple of integers (d),..., dy). 
If d,e € F are such that dj, e;|n + s; for each 7, then (dje, dje;)|s; — 3; for 
i # j. Since the numbers dj;e; and dje; have no prime factors < y, they 
must be coprime as soon as y > s,% — 8; > |s; — 8;|, which we assume from 
now on. Consequently, 


> ti, = Sut (m) ~ A(d)A(e) > 1, 


N<n<2N m d,ecg N<n<2N, m|Q(n) 
(dies djej)=1 ViAj [dj,ej]|n +5; Vi 


By assumption, * is supported on integers m|P(y) = []_,-,,p, whereas A 


pSy 
is supported on tuples (d;,...,d,) with (dj, P(y)) = 1 for all 7. Since we 
also know that (dje;,d;e;) = 1 for i  j, the Chinese Remainder Theorem 


implies that there are precisely v(m) values of n modulo m ez [d;,e;] such 


304 28. Small gaps between primes 


that m|Q(n) and [d;, e;||n + s; for j = 1, ..., &. Therefore, 
N 
(28.8) > i= a + O(v(m)). 
N<n<2N, m|Q(n) m[Tj-1ld;, e; 


[d;,e;]|n+85 Vj 
We thus arrive at the estimate 
d 
S> Wn=VIN S> r( )A(e) + O(R), 


k 
N<n<2N djec9 [Tj=l4;, e5] 
(dje;,dje;)=1 ViFj 


where 


v= —— and R= 5) S© |ut(m)Xd)AE)|v(m). 


m dvecG 


We have R = O(.N?/3). Indeed, to see this, we use that v(m) < T,(m) 
for square-free m, |u*| < 1, ||Allo = B = O(1), w* is supported on [1, Y] 
and X is supported on tuples (di,...,d,) with dyj---dy < D< NwV4, 

In addition, we have V+ = V(1+ O4(1/(log N)4+3*)) by Theorem [19.1] 
as well as 


(28.9) 3 JA(A)A(e)| < B? 73(my) -++73(me) < (log N)3* 


d,ecF Tht [dj e;| mi--mp<w7N ie ik 


where we set mj = lds e;| and used the fact that the equation m; = ids e;] 
has [] +m, (2v + 1) < 73(m,) solutions. 


Putting everything together, we conclude that 
A(d)A 
S> oum=vN SD AAC) o40v/(0g.N)4). 


N<n<2N d,ecP Ih ild;, e5] 
(diez djej)=1 ViFj 


Next, we remove the conditions that (dje;,dje;) = 1 for i # j. Since 
d,e € J, we have (dje;,dje;, P(y)) = 1. Hence, if (dje;,dje;) > 1 for some 
i # j, there must exist a prime p > y dividing [d;,e;] and [d;,e;]. Setting 


m, = |d,,e,| for r=1, ..., k, we conclude that 
JA)A(e)| <BS S- 73(m1) - ++ 73(mx) z (log N)3k 
d,ecJ nee a - P>Y mp D? Vr My-**° ME y 
(di eid; ej;)>1 eae 


Hence, we have arrived at the estimate 


(28.10) S > wn =VN SO + Oa(N/(log N)“). 
N<n<2N dec Teel i 


Calculations 305 


The next step is to rewrite the terms 1/{d;,e;]. To do so, we use that 


I tae) 1 
[die] de de ah; 


just like we did when we studied co sieve. We thus deduce that 
a 
Sy tn = VN SD MB) C88) 0)? 4 O4(N/ og NYA). 
N<n<2N acD 


Finally, we remove the factors y(a;)/a; = Deis, (1 —1/p). Note that 
w(aj) < loga; < log N, as well as (aj, P(y)) = 1 for all 7. Consequently, 


(28.11) g(a)/a > (1— 1/y)8™) > 1 — O(log N)/y) 


by our choice of y. Bounding the total contribution of the error terms using 
(28.9) comletes the proof of the lemma. 


Lemma 28.4. Assume the above set-up and define 


3 A(aymy, ...,a~™Mz) 


Gela1,--+1@%) = Lag=1 myi-+*Mr 


(m1,..;M~E)ED 
me=l 


If we let X = ea dt/logt, then for any fired A > 0 we have 


Vx Co(ay,..., an)? 
Wp-s) = = S- Celi, +++ Uk)” 

ate ee Mpcy(t ae (a1,..,4n)ED a a 
+0,4(N/(log N)4). 


Proof. To simplify the notation, we consider the case ¢ = 1; the proof of 
the other cases follows mutatis mutandis. 


Since w, = n°) from the divisor bound (see Exercise 2.9(f)), we have 


(28.12) > Wp = D> Wp, FON). 


N—s1<n<2N-—s1 N<p<2N 


If W denotes the sum on the right side of (28.12), then 


W=Soutim) SS” Xda)Xe) 1. 


d,ecF N<p<2N, m|Q(p—s1) 
[d;,e;]|p—sits; Vj 
As in Lemma [28.3] we can only have d;,e;|p — 51 + s; for all j when 
(dje;,dje;) = 1 for all i A j. Notice though that there is something special 
that takes place when 7 = 1: we then have dj, ae —s, +85, =p. Sincea 
prime number p has only trivial factors and d;,e; < N!/4 < p, we conclude 


306 28. Small gaps between primes 


that d; = e; = 1. Similarly, if m|Q(p — s1) = Tika +s; — 81), then 
m|Q*(p), where 


k 
Q* (x) = [[(@+ 5; — 51). 
j=2 


As a consequence, 


(28.13) W=S > pt(m) S° d(d)X(e) S° 1. 
m d,eEY, dy=e,=1 N<p<2N, m|Q*(p) 
(dje;,dje;)=1 Vij [d;,e;]|p+s;—s1 Vj>2 
To evaluate the innermost sum, we adapt the argument leading to (28.8). 
Ifq = mI[f_oldj, ey], then the Chinese Remainder Theorem implies that 
the number of x(modq) such that m|Q*(x) and [d;, e;||~ + 5s; — s1 equals 
#{x (modm) : m|Q*(x)}. However, since p is prime, we must only count so- 
lutions that are reduced residues mod g. Whenever x = s1 —s; (mod [d;, e;]), 
we also have (x, [d;,e;]) = 1 because (dje;, P(y)) = 1 and y > |s1 — s;| for 
each j. In conclusion, the number of reduced solutions mod q is 


y*(m) i= #2 € (Z/mZ)" : mIQ*(2) . 
Hence, the innermost sum in (28.13) equals 


y*(m) 


(28.14) a0) 


X + O(v*(m)E(N,gq)), 
where 
ENG) = EO |7(2N; 9,4) — (N34, a) — X/p(q)|- 
The modulus q here is an integer 
<Q:=YD?=N' exp ((log log N’)? —2 log N ) 


by our assumptions on the support of + and of A. Moreover, if we are 
given such an integer q, there are < 7,%_1(q)73(q) ways to write it in the 


form mI[f_old;, ej] with m, do, ..., dy, €2, ..., ek as in the right-hand side 
of (28.13). Since we also have v*(m) < (k — 1)“, we arrive at the formula 
A(d)X 


k 
dec, di=ei=1 [Tj-2 y([d;, ;]) 
(diex,djej)=1 Vig 


with 
it 3 ut (m)v*(m) and R= S- Th-1(q)°73(q)E(N, q). 
mlm) se 


We use the Bombieri- Vinogradov theorem (see also Exercise[26.2) to find 
that R* = O4(N/(log N)4). In addition, we note that v*(p) = p—1 and use 


A change of variables 4 la Selberg 307 


Theorem to find that V* = [1 + Oa(1/(log N)4+3*)|V/ Mey). 
As a consequence, 


VX A(d)A(e) N 
VS ee + Oa ; 
IIp<y(1-1/P) a. ae TI» (dj, e3]) (Goma) 
(dje;,dje;)=1 ViFj 


where the error term from the estimation of V* was handled using (28.9). 

Next, as in the proof of Lemma [28.3] we may remove the conditions 
(dje;,dj;e;) = 1 when i 4 j at the cost of an error of size < N(log N)3*/y. 
Finally, we may replace y([d;,e;]) by [d;, e;] using at the cost of an 
error term of size N(log N)°™/y. Hence, we arrive at the formula 


__ vx \(d)A(e) N 
—— ILat _ 1/p) Py Isa, ej = Oal tg ny): 
dij=e,=1 


which is analogous to (28.10). It is now straightforward to adapt the argu- 
ment from the proof of Lemma that estimates the right-hand side of 
(28.10), and to complete the proof of the lemma. 


A change of variables a la Selberg 


Motivated by the theory of the Selberg sieve, we will switch from the function 
to the function € defined in Lemma We must write Ce in terms of 
this new function. 


Lemma 28.5. For each (a1,...,ax) € 9 and each £€ {1,...,k}, we have 


_ b(b)E(a4,..-, Ae_1, b, Gpyy, ++; Op) 
Co(ai,---,@k) = lay=i = 5 : 
Proof. To ease the notation, we demonstrate the calculation when ¢ = 1. 
As in the proof of Theorem[21.1](see the argument leading to relation (21.5)), 
we have the inversion formula 


ju(by) +++ (bp )E(b1d4,..., bpdx) 
(28.15)  A(di,.--, dx) = lay, dy EG S> - 


bi yesDe bies* Oe 
Consequently, if (a1,...,a%) € J with a, = 1, then 
_ A(1, aada,..., andy) 
€1 (a1, @9;---,@%) = S° ence 
dz,...,dk 
= S- S> j1(b1) mus pu( by )E(b1, agbedsa, aay apbgdg) 
does Gpbys<e bp : 


do,...,dp, b1,...,d% 


308 28. Small gaps between primes 


Making the change of variables m; = bjd; for all 7 > 1 implies that 


_ (by )E(b1, agma,...,aKMz) 
Ci(a1,@2,..-,a%)= So ares Il » a 
b1,M2,...,Mk j=2 bj dj=m; 


The innermost sum vanishes unless m; = 1, thus completing the proof. 


Choosing the function € 


Motivated by Lemma [28.5] we set 


1p- (ay--ay)>y(—L) Marae) log ay log Qk 
oe D)* Wed —t/e)* log D’”’ log D 


where f is a smooth function supported on the simplex 
Ar = (Biya 4ig) € (0, 1)" PUy +e +4 < qe 


and the factor (—1)@1"¢) is introduced to annihilate the sign changes 
caused by y(b) in the expression for ¢p in Lemma[28.5} Lastly, the denomi- 
nator (log D)* Ihe h= ly p)* is introduced for normalization purposes, so 
that ||A]|0 = O(1) by (28.15), as needed. With this choice of £, we have the 
following result. 


Clits. v0 (ae) = 


Lemma 28.6. Let ¢€ {1,...,k}, and set 


2 
Re-1 


J(f) = Fai, .++5tp) deer ++ dap. 
R 
If € and f are as above, and we assume that J(f) > 1, then 


En<n<2n [1p(n + 8¢)] = a4 (1/VlogN ). 


and 


The proof of Lemma [28.6] rests on the following result, which we will 
eventually apply with w = y so that u = (log N)/(loglog N)?. 
Lemma 28.7. [fr € Zs1, 9g: R" — R is a smooth function supported on 
A, and u,w > 2 are such that D=w", then 
logni log nr r 
g Dee aD: 1 l 
5 a BH aor (1-2) (fool). 
_ Nyt Np Pp A, u 
P~(nj)>w paw 
l<j<r 


where the implied constant depends at most on r, and on the supremum 
norm of g and of its partial derivatives. 


Choosing the function € 309 


Proof. All implied constants might depend on g and on r as described in 
the statement of the lemma. Throughout the proof, we set 


i (ee log ti) 
z= ws = Da and Gi. << = leg 2 Eis 
LL Lp 
Note that 
1 OG 1 1 
(28.16) G(x,...,27) < ———_, —(Lijacig hn) <, 
Die ae On; Ey. Te ay 


We will often denote the r-tuple (x1,...,x,) by the bold letter x. 


Given a parameter X > 1, let 
Dy :={ (ti; .1 +540) © |X, +00)" tay +a, < DY}. 


We first show we may restrict our attention to tuples (m1,...,nr) € Dz. 
Indeed, for each fixed j € {1,...,r}, the contribution to the sum of the 
statement of the lemma of those summands with nj; < z is 


< yey —— « (log Dy~og) TJ (1-5), 


nj <z, MLD ViAG pw 
P(ny--n-)>w 


which is of admissible size by our choice of z. 

Next, we treat the part of the sum over (n1,...,n;-) € Dz by splitting 
it into small rectangles. We set z = z and, having defined zg, we set 
Zat+1 = Za +./Za- Moreover, let .% denote the set of rectangles of the form 
l= [jas (23, 23 + ./%], where x1,..., 2, € {z0, 21,...}. Finally, we write 
D’, for the union of all such rectangles that lie entirely within D_, that is to 
say, D!, is the union of J of the above form for which [jas (2; Trl Oy) eed: 
In particular, if (41,...,2,) € Dz \D1, then D/(1+ 27/2)" < a -++ a, < D. 
Combining this with the first bound of (28.16), we find that 


(28.17) S>  Gay= SYS) Gw)+O(N/D), 
P-(ni--nr)>w PO (ny--nr)>w 
where N := #{ (n1,..., nr) € N" : ny --- ny € [D/(1+z-/?)", D] }. Exercise 
implies that N < D(log D)"~!/,\/z+ D!~'/", so that the error term in 
(28.17) is of admissible size. 
Now, fix J = [jas (23, 23 + /aj] S Dz. Since I has volume \/z1---Z,, 


we have 


1 
G(n) = —— | G(n)at. 


When n,t € J, the Mean Value Theorem and the second bound of (28.16) 
imply that G(n) —G(t) = O(z~/?/(t, ---t,)), since 1/(1+1/V2) < t;/n; < 


310 28. Small gaps between primes 


1+ 1/2 for all 7. We thus conclude that 


y-1/2 
G(n) (ct) + oe) dt 


1 
— al 


for all n € J. In addition, applying Theorem [[8.1]{a) r times, we find that 
#{(n1,...,Mr) €INZ": P-(n---n,) > w} 


= (1+0(1/u)) Vm TT (: 7 ») 


pKw 


log u 


since 77 > z=w for each 7. Putting the above estimates together yields 


the formula 
Le O(1 
S> Gn)= I] (1 - *) [ (ew + oe dt. 
ner oD p I tittl> 
P7~ (nin) >w 
Finally, we sum the above estimate over all rectangles I € .%; they are 
all subsets of D, by definition. The lemma then follows by combining the 
resulting formula with (28.17) and the fact that 
dt; --- dt, 
[ot seag bp )Cy <7 dle << | ee (log D)’" log z, 
e & ty ren tr 


where € = (D, \ Dz) U(D, \ Df). 


Proof of Lemma [28.6] To ease the notation, we give the proof when @ = 

1; the other cases are similar. Let us begin by recalling relation (28.7). 

Together with Lemmas and 28.4] it implies that 

VXS1/TIpcy(l — 1/p) + Oa(N/(log N)“*") 
VNT + Og(N/(log N)4) 


En<n<2n [1p(n + 51)] = 


for any A, where 


2 
Si = s Ci(aa, +++)" and T= eS eae, 


a oe “a 
(€1,.-.,AK)ED ' (a1,.-.,0n)ED 
a,y=1 


The choice of € and Lemma (applied with g = f?, r =k, w 
and u = (log N)/(log log N)”) implies that 


(28.18) T =L*(J(f) + O(1/vVlog N)), 


where 


I 
cd 


_ log N 
~ Jogy’ 


L := (log D) | [ (1 —1/p) 

psy 
In addition, note that V >> 1/(logy)*. Therefore, if take A = k + 2, 
then O(N/(log N)4) = O(VNL-*/log N). Since we also have that X/N = 


Optimizing the function f 311 


1/log N + O(1/log? N) and log N = 4log D + O(log N), the lemma will 
follow as long as we can show that 


(28.19) S, = L' *(L(f) + O(0/ Vlog N)). 


For any a = (a1,...,@x%) € J with aj = 1, Lemma and our choice 
of € imply that 


2 logb logag log ax 
LL ()f(iceD) los? *> Tog D) 
Gee b : 


P-(b)>y 


We remove the weight y?(b), by noticing that if y?(b) = 0 and P~(b) > y, 
then there is a prime p > y such that p?|b. Hence, the total error produced 
by replacing :?(b) with 1 is < L~*/y. To the rest of the sum, we apply 
Lemma 28.7] with r = 1, w = y and u X (log N)/(loglog N)?. This yields 
the estimate 


ca) = CE (fle log az Pett) ans + O(1/ Vlog) ). 


log D’**? log D 


Therefore 
l 1 2 
_ 72-2k (Ja f(®1, Beb>-+-> Tee) 421) i 
S=L > + O , 
a2°:'ak Jlog N 


P- (ag:-azK)>y 


where we used an upper bound sieve to control the contribution of the 
remainder terms. Finally, we apply again Lemma |28.6} this time with r = 
k — 1, to deduce (28.19). This completes the proof of the lemma. 


Optimizing the function f 


In view of Lemma[28.6} our goal is to choose f supported on A; and maxi- 
mizing the ratio 
k 


Ie(f) 
(f):= ; 
= > Tf) 


If we can show that p,(f) > 4m for k large enough in terms of m, we 
automatically conclude that lim infn—oo(Pnim — Pn) < 8 — $1 < OO. 


As a warm-up exercise, we study pz(f) using calculus of variations. Note 
that we may drop the assumption that f is smooth, since the integral of any 
measurable function over a compact region can be approximated arbitrarily 
closely by integrals of smooth functions. Consider the linear operator 


k 
(Lit (142025 e) = f Fes... teatsen,.-- san) 
e=17R 


312 28. Small gaps between primes 


acting on C,(R), the space of compactly supported, continuous functions 
f :R* SR. Letting (f,g) = Spx fg, we find that 


k 


Siu) = Leff 


l=1 


whereas J(f) = (f, f). If, now, f is a maximizer of the function pz(-) over 
all f supported on A;, then the function ¢ > pz(f +¢g) has a maximum at 
¢ = 0 for any continuous g : R* > R supported on Ax. So its derivative at 
€ = 0 must vanish, which implies that 


(Lef,g) + (Leg, f) = 2pe(f) fF, 9)- 


It is easy to see that Ly is a self-adjoint operator, so we find that 


(28.20) (Lif, 9) = ox(f)(f,9)- 
Lastly, a standard continuity argument allows us to extend (28.20) to all 
bounded measurable functions g that are supported on Ag. 


We apply for a special choice of g. Let (B,)°2, be a shrink- 
ing family of cubes centered at a given point (21,...,2,) in the interior 
of the simplex A;,, and take g = 1g,/Vol(B,) with n — oo. Applying 
to this family of functions g, we deduce that (L;f)(r1,...,v%) = 
P(f)f(xi,...,@%). Since (a1,..., 2%) is arbitrary and f is continuous, this 
implies that (Li f)|a, = e(f)- f. In particular, f is an eigenfunction of the 
operator Le ‘= (Leg)|a, that acts on continuous functions g : A, > R. 
The corresponding eigenvalue is p,z(f). 


Now, note that if f is an eigenfunction of the operator Le of eigenvalue 
pr(f), so is its symmetric version 


f(zi,.. : ye) ‘— > J (fede . <po(he))» 


oS, 


In light of this observation, we may restrict our attention to symmetric 
functions f, in which case 


(28.21) pr(f) =kh(f)/J(f). 
To this end, we define 
Ry := sup{ pe(f): f : Ax > R, f symmetric and continuous }. 


An asymptotic estimation for Rx is given in Proposition [28.8] below. In 
addition, explicit bounds on R,z can be found [138\{151]. 


Proposition 28.8. For large integers k, we have that 


logk — 4loglogk + O(1) < Ry < logk + loglogk + O(1). 


Optimizing the function f 313 


Proof. For the lower bound, we consider functions of the form 


f(a1,---, 2%) = 1(ay,..,2,)eA, °° 9(R41) > ++ G(kEE), 


where g : R + Rso is a function supported on the interval [0,6k] with 
6 € (0,1) to be chosen later, and such that f° g(t)?dt = 1. Then 


k 1 
oe f( Li gs sceig \?day-- -dzp < CF aah) = TE 
R 


Together with (28.21) and the change of variables t; = kx,;, this implies that 


k—(to+--+tp) 
lt) > falta atte)*(f g(tr)aty) ‘dtp +d 


> (f seoae)” [9 al? ate ate~- at 
= (f° atyae) "(a+ +X < (1-8). 


where X9,..., Xz are independent random variables with density function 
2 
g°. Let 


[oe 
p= EX] = [ tg(t)7dt 
0 
and Y; = X;-— uw, 2 <i <k, so that Yo,...,Y,% are mean-zero independent 
random variables that are identically distributed. 
If we assume that 6 < 1— pw, then 
P(Xo+-:-+X~_ > (1—-S)k) < P(Yat---+¥_ > (1-5 —p)k) 
V[Yo treet Yx] 2 V[Y] 
G-6-pyPk ~ d-—s—p)k 
by Chebyshev’s inequality and the independence of the Y;’s. Furthermore, 


< 


V[Yo] < E[X3] = [es ar< ok [tg )Pdt = dku 
0 
by our assumption that g is supported on [0, 6k]. In conclusion, we have 
2 
oo Op 
> = ee |. 
(28.22) re> ott) > (f° sat) Q- 7) 


for any measurable function g > 0 supported on [0,6] with ino g° = 1 and 
= fug(u)*dt < 1—6. We choose 


Lox) (t) 
1+ At’ 


where the parameters 6, c, A will be determined shortly. 


g(t) = 


314 28. Small gaps between primes 


First of all, note that the hypothesis that f, 9? = 1 implies that 


phe 7 dt 1 
Jo «(14+ At)? A-(14+1/(A6k))’ 
Hence 
6k 2 
ct 1+ 1/(Adk) 1 
= = log(1 + Adk) — 1 , 
7 / a+ Ape” A (tos ee) ae 
To force pz to be close to 1 and 6 to be smaller than 1 — 4, we take A = logk 
and 6 = 1/(logk)3, so that 
1 
= ——(log(k/(l p 
p= jog (loatk/(log k)*) + 0(2)) 
2 log log k 
a 1/1 
ey + O(1/ log k) 
<1-6-1/logk 


for k large enough. Since i,” g(t)dt = clog(1+ Adk)/A, the above inequality 
and (28.22) imply the lower bound 


c? log?(1 + Adk) 


Ry 2 Ae eesTplee) 
og” og k)? 
_ log ieee) diy 1/log k) + O(1) 


= logk — 4loglogk + O(1), 
as claimed. 

Finally, we prove the upper bound on Rx. Let f be a symmetric, mea- 
surable function supported on A;. Motivated by the shape of f yielding our 
lower bound on Rz, we use the Cauchy-Schwarz inequality in the following 
fashion: 


(f #eaer)" = ([ f(x)der)- < [or kA) f(x)*day [ ar 


log(1+kA) f? 
= Rell + KA) df (1 + kAm) f(x)2dan. 
Therefore ; i ; 
nip) < BC+) [G+ kan) f0?ax. 


By symmetry, 


k k 
log(1+kA : 
pelt) f, fosPax= So tf) < BOF EY [+ kann) s(aax 

t=1 t=1 
Since 1 +---+a,% < 1 in the support of f, we conclude that pz(f) < 
(1+ kA) log(1+kA)/(kA). Taking A = logk completes the proof. 


Exercises 315 


We may now complete the proof of the main result of this chapter. 


Proof of Theorem [28.1] Combining Lemma 28-6] and Proposition [28.8] 
we find that there is a choice of weights wy, such that 


log k 
Evenc2n | So Ip(n + s0)| 2 — — log log k + O(1). 


1<t<k 


We take k = [Cm*e*”] for a large enough constant C so that the right- 
hand side becomes > m. In particular, there must exist n € [N,2N] such 
that n+ s; is prime for at least m+ 1 values of j7. Since N can be taken 
to be arbitrarily large, we conclude that lim infy,.0(Pnim — Pn) < 8% — $1- 
We take s; to be the jth prime that is > k. We may easily check that the 
tuple (s1,...,8%) is admissible. Since s, < klogk < e#”m° by the Prime 
Number Theorem, we have completed the proof of Theorem [28.1 


Exercises 


Exercise 28.1. (a) Fix coprime a,q € N and let p< p, <--- be the sequence of 
primes = a(modq). Show that limn+o.(p,41 — P,) < 00. 

(b) Let qi < q2 <--- be an infinite sequence of primes. Find necessary conditions 
so that lim infn_.oo(@n41 — dn) < ©. 


Exercise 28.2. Let S be the set of integers s > 1 for which there are infinitely 
many primes p such that p+ 2s is also prime. For every x that is sufficiently 
large, show that #S 1 [1,2] >> x. [Hint: Show that there is some H such that 
SO (m,m-+ H] #0 for all m > 1] 


Exercise 28.3. eh If f(a1,..., 2%) = F(ai+---+2%)12,,....2,50 With F : [0,1] > 
R continuous} then show that 
ii uk-2( fp" F)?du 
i uk-1F(u)2du 
(b) (Goldston-Pintz-Yildirim [63]) Show that supp pg(F’) > 4 — o¢+yoo(1). (Hint: 
Take F'(u) = (1—u)™.] 
(c) Assuming the Bombieri-Vinogradov theorem (Theorem[I8.9) holds for Q < x? 
with 6 > 1/2, deduce that liminfy+.(pn+1 — Pn) < co. 


(28.23) pe(f) = pe(F) := kk 


(d) (Soundararajan) Integrate by parts to show that 


7 ub F(u)(f- F)du Zi, @ uk _ 


in(F) = 2k- 
a Jy uk 1 F(u)2du Jo uk LF(u)2du 


Generalize this argument to conclude that 6,(F') < 4k/ T1321 (k+2?-2) 1/7?” <4 
for all F £0. 


3This corresponds essentially to the definition (28.4) of the GPY weights. There, one must 
also assume that F' is smooth enough, but this is not needed when optimizing the quantity pz (f). 


316 28. Small gaps between primes 


Remark 28.9. Exercise 28:3(d) proves that the original GPY weights cannot 
prove that limyn+oo(Pn41 — Pn) < CO without access to an improved version of 
the Bombieri-Vinogradov theorem. Zhang’s breakthrough was to supply this nec- 
essary improvement. In contrast, in the more general Maynard-Tao weights the 
quantity p,(f) can become arbitrarily large when k — oo, thus sidestepping the 
need for an improved Bombieri- Vinogradov theorem. 


Exercise 28.4 (Conrey)* Define 6(F’) by (28.23) and let \ = k(k—1)/ supp fx(F). 
(a) If there is G € C([0, 1]) such that p(G) = k(k —1)/,, then show that G(x) > 0 


for all x € [0,1]. In addition, use calculus of variations to show that G must 
satisfy the integral equation 


x 1 
(28.24) a*-1G(a2) = af uk? G(t)dtdu (0<a<1). 
0 u 


(b) Conversely, let G be a continuous and non-negative function satisfying (28.24) 
and whose set of roots has null measure. Show that f,(G) = k(k — 1)/X and 
that px(£’) < px(G) for any continuous F': [0,1] > R. [Hint: For the second 
part, use the Cauchy-Schwarz inequality. ] 


— 
io) 
— 


Show that any continuous solution to (28.24) must be smooth on [0,1] and 
satisfy the differential equation 7G” (x) + kG’(x) + AG(x) = 0. In addition, 
a solution to this differential equation that is analytic around 0 must be a 
multiple of the function « + A(Ax), where 


(—2x)" Jp—-1(2,/2) 
A(x) = 2 ri +k—1)! _ eae 


with J,-1 denoting the (k — 1)th Bessel function of the first kind (see [183]). 


(d) Let An(a) = OY_y(—2)"/[ri(r + & — 1)!]. Show that a solution to (28.24) 
normalized so that G(0) = 1/(k — 1)! must satisfy 
1 
1 


In addition, for x € [0,1] and n € Zyo, we must have 


(e) Show that if there is a continuous and non-negative solution to (28.24) with 
A> 0, then Agn4i (Av) < G(x) < Aan(Ax) for x € [0,1] and n € Zo, and thus 
G(x) = A(Az) for x € [0, 1]. 


(f) When G(#) = A(Az), show that (28.25) is equivalent to Jp_2(v2A) 


Remark 28.10. When m €N, it is known that J,, has infinitely many positive 
real zeros. If z,, denotes the smallest such zero, we also know that z1 < zo <--- 
and Zz, >m+7—1/2 B60]. Hence, if \ = z2_,/4, we infer that A(Ar) > 0 for 
x € (0,1) . In particular, (28.24) has a non-negative and continuous solution G for 
which sup p px(F) = px(G) = 4k(k — 1)/zZ_, < 4. Thus, we recover the conclusion 
of Exercise [28.3{d). 


Chapter 29 


Large gaps between 
primes 


In the previous chapter, we demonstrated gaps in the sequence of primes 
pi < pa <.--: that are much smaller than the expected size of pni1 — pn, 
which is logn. We now turn to the opposite question: does the gap pn+1—Pn 
get large compared to logn? The answer should be affirmative. To see why, 
we turn again to Cramér’s model. 


Recall that (X;,)?2., is a sequence of independent Bernoulli random vari- 
ables such that X; = 0, Xq = 1, and with P(X; = 1) = 1/logk for k > 3. 
Let P, be the random variable that equals the nth smallest index k such 
that X; = 1 (that is to say, P, models the nth smallest prime number). 


Proposition 29.1. With probability 1, we have 


iF Prac _— Pr = 
TUS ae ds 
n—>0o (log n) 


The above result is a simple consequence of the Borel-Cantelli lemma 
from probability theory [7, Theorems 4.3 and 4.4]. 


Lemma 29.2 (Borel-Cantelli). Let E,, E2,... be some events in a proba- 
bility space, and let E be the event that infinitely many of the E;’s occur. 


(a) If 2 j51 P(E) < 00, then P(E) = 0. 
(b) If the events Ey, E2,... are mutually independent and iI Pie) 
=o; thew PLE) = 1. 


317 


318 29. Large gaps between primes 


Proof of Proposition [2 For each k € N, r > 0 and A > 0, let Ex(r, d) 
be the event that X; = 1 ae at most r integers j € (k,k+Alog? k]. We will 
use the Borel-Cantelli lemma to prove two key facts about these events: 


Claim 1. If X > 1 is fixed, then with probability 1 at most finitely many 
of the events E;,(1, A) occur. 

Claim 2. If X < 1 is fixed, then with probability 1 infinitely many of the 
events £;,.(0, A) occur. 


We leave it as an exercise on Cramér’s model to verify how these two 
claims can be combined to complete the proof of the proposition. 


Now, let us prove Claims 1 and 2. If we let J, = ZM (k,k + Alog? k], 
then the independence of the X;’s implies that 


P(Ex(0, A)) = II (1 a -) = Z->+0(1) 


pas log j 


and 


P(Ex(1, A) \ Ex( (0, )) cre —. II (1 = 1 -) = kh Ato(1) 


a eve \{i} logs 


as k > oo. In particular, Claim 1 follows immediately from Lemma a). 

Finally, let us fix \ < 1 and prove Claim 2. If kj = |j log? j|, then 
we may easily check that the events E,,(0,) are mutually independent for 
large enough j, as well as that )/;., P(Ex,;(0,A)) = co. Thus, Claim 2 
follows from Lemma b). 


Proposition leads us to guess that 


lim sup Bosh DW 1 
n—oo (log n)? 


However, Granville’s refinement of Cramér’s model suggests the lower bound 


Pn+1 — 


= oie T= 1.12291, 


lim su 
aon (log n) 
(see Exercise [29.2). It is not clear what the true value of this limsup is 
(though see Exercise[30.1). In this chapter, we will prove a weaker result. A 
simple corollary of it is that the normalized gap (pr41 — Pn)/logn can get 
arbitrarily large. 


Theorem 29.3. We have that 


; Pn+1 — Pn ' log n logy n logy n 
] = th L(n)= 
= L(n) = (”) (logs n)? 


where log; denotes the jth iteration of the logarithmic function. 


The Erdés-Rankin construction 319 


This theorem was proven independently by Ford, Green, Konyagin and 
Tao [46], and by Maynard [139]. Here, we follow Maynard’s argument which 
is more in tune with the ideas we have developed thus far. Ford, Green, 
Konyagin and Tao used a different technique, building on the work of Green 
and Tao on long arithmetic progressions in the sequence of primes [77]. 


The Erd6és-Rankin construction 


All constructions of long strings of composite numbers are based on the 
concept of a covering system of congruences. We say that the system of 
congruences {a; (mod qj) } Fo covers the set of integers NV if for each n € N 
there is some j such that n = a; (mod qj). 


Recall the notation P(z) = II,< _p- Our strategy for proving Theorem 
is based on the following simple lemma. 


Lemma 29.4. Let H > 1 and z > 2. Assume that there is a system of 
congruences {dp (mod p)}p<z that covers ZM[1,H]. Then there exists some 
n € (P(z),2P(z)| for which there are no primes in (n,n + H]. 


Proof. Let n be the unique integer in (P(z),2P(z)] satisfying the con- 
gruences n = —a,(modp) for all primes p < z. If h € [1,H] NZ, then 
h = ap (modp) for some p < z, that is to say, there exists a prime p < z 
that divides n + h. In particular, n + h cannot be a prime number. 


In preparation for our proof of Theorem [29.3] we first show the weaker 
result due to Rankin that 


R Pn+1— Pn 
29.1 lim sup ————— 
Even though the gap between (29.1) and Theorem[29.3]seems small, making 
this leap was a long-standing open problem due to Erdés[}] 
To prove (29.1), we fix a small constant c > 0 to be chosen later. In 
addition, we let z > 2 be a parameter tending to infinity, X = P(z) and 
H (log, H = 
clog H logs H’ 


> 0. 


cz log z logs z 
(logs 2)? 
We will show that we can pick congruence classes ap(modp) with p < z 
covering the integers < H. Assuming that this is indeed possible, we can 


then apply Lemma [29.4] to find that maxx <p, <2x(Pn+1 — Pn) > H. Since 
log X ~ z by the Prime Number Theorem, we deduce that 


(29.2) fe ; whence Ze 


1Paul Erdés had a legendary knack for asking very hard questions with deceptively simple 
statements. Occasionally, he would offer a monetary award for their solution. The award he 
offered for proving Theorem was $10,000, the largest “Erdés prize” ever. 


320 29. Large gaps between primes 


(29.3) lim sup 


which establishes (29.1). 


Let us now explain how to construct the classes ap(modp). We will 
select them in three stages, determined by two parameters y and Y to be 
chosen so that (log H)? < y < Y < H/2. Throughout, the letters p and q 
denote prime numbers. 


Stage 1: Intermediate primes. When p € (y,Y], we select ap = 0. 
This choice is the key to the success of the Erdés-Rankin method because 
it leaves uncovered few integers < H. Specifically, if VV is the set of integers 
n < H not covered by the classes 0 (mod p) for p € (y, Y], then either n is 
y-smooth, or n = mq with q > Y prime and m < H/Y. Writing y = H1/" 
and applying Theorems [16.4] and we find that 


A A 
< UH, H/m)< + —— 
W1< UH) + Yo mim) <+ oy 
m<H/Y m<H/Y 
HH A log(H/Y 
2 og(H/Y) 
The log Y 


Stage 2: Small primes. For the primes p < y, we select the progressions 
dy (mod p) “greedily”. 

We begin by letting a2 (mod 2) be any class a(mod2) maximizing the 
quantity #{n < N :n=a(mod2) }. Having chosen a2, we set No = {n € 
N :n € a2 (mod 2) } and note that |No| < |N|/2. 

Next, we let a3 (mod 3) be any class a(mod3) maximizing the quantity 
#{n © No: n = a(mod3) }. If we set N3 = {n € No: n # a3 (mod 3) }, 
then |NV3| < (1 — 1/3)|No|. 

Continuing this way, we find that there are progressions a, (mod p) in- 
dexed by the primes p < y, such that the set 


N :={n EN : Ap < y for which n ¥ ap (mod p) } 


has cardinality 


= ea H log(H/y) 


(29.4) IW < WI If (1 . St ae 


<K 

— u" log y 
Stage 3: Large primes. Due to the nature of the second stage, we have 
completely lost control of the residual set VV’. Hence, we will apply a trivial 
argument in the third and final stage of the argument. For it to work, we 


A more efficient covering system 321 


must guarantee that the remaining primes (i.e., the primes in (Y,z]) are 
more than the number of integers in V’. We thus choose 


(29.5) y= F083 H/(3 loge 1) and ¥ = z logs 2 oS H log, H 


logy z clog H ~ 


Indeed, with this choice of parameters, we have u" log y = (log H ro), as 
well as that H log(H/Y)/(logylog Y) x H (logy H)?/(log H log, H). Apply- 
ing (29.4), and taking c small enough and z large enough in (29.2), implies 
that |N“| < 2/(Blogz) <#ALY <p < 2}: 

Since there are more primes than integers left to cover, we can easily 
complete the proof: if qi, ..., q; are the primes in (Y, z] and n1,...,ng are 
the integers in NV’, we let ag, = nj for each j < ¢, and choose ag, for £< j <k 
arbitrarily. This concludes the construction of the claimed covering system 
of congruences, and thus the proof of (29.1). 


A more efficient covering system 


We now turn to the proof of Theorem As before, we wish to find a 
system of congruences {ap (mod p)}p<, that covers [1, H], but with c being 
arbitrarily large. In view of (29.3), this suffices to prove Theorem 29.3] 


To find this more efficient covering system, we will improve upon Stage 3. 
Specifically, we will show that it is possible to choose the a,’s for p € (z/2, z] 
in a way that each congruence class covers many elements of the residual 
set N’ and not just one. 


Throughout, y and Y are defined by (29.5). However, the first stage of 
the selection of the covering system has an extra auxiliary part that deals 
with very small primes and helps simplify the situation in the last two stages. 


Stage la: Intermediate primes. We again choose a, = 0 for the primes 
p © (y,Y]. We are then left with integers n < H such that either n is 
y-smooth or n = gm with q > Y prime and m < H/Y < logz. 


Stage 1b: Very small primes. We also select a, = 0 when p < log z. 
The effect of this auxiliary stage is a simplification of the residual set, which 
now equals {n < H: pln => logz <p<y}U{Y <q< H}. Its first 
component has small size by Theorem [16.4] We will cover it trivially at the 
end of Stage 3. We thus focus on the set of primes in (Y, H]. 


Stage 2: Small primes. Next, we choose a, for p € (logz,y]. In the 
previous section, we selected these congruence classes greedily. Here, we 
simply take a, = 1. This has essentially the same effect as choosing the ay’s 


322 29. Large gaps between primes 


greedily. The reason is that g—1 looks a lot like a “random” integer, so the 
chance that it has no prime factors in 


P := f{logz<p<y} 


is about [],,<p(1 — 1/p). The advantage of having a, = 1 for all p € P is 
that it allows for an explicit description of the residual set: indeed, after 
Stage 2, we are left with certain y-smooth integers n < H and with the 
set of primes {Y <q < H: (q—1,P) =1}. We must cover them using 
congruence classes ap (mod p) with p € (Y, z]. 


Stage 3a: Large primes. This stage will be the most delicate and will 
take most of the remaining chapter to be completed. We summarize it in 
the following proposition whose proof is postponed till the next section. 


Proposition 29.5. Fixe > 0. Let z, H and y be as above. There is a 
choice of congruence classes ap (mod p), p € (z/2, z|, covering > (100 — €)% 
of the set of primes O:={z<q< H:(q—-1,P) =I}. 


Assuming the above result for now, let us see how to use it to complete 
the proof of Theorem [29.3 


Stage 3b: Large primes — cleaning up. Let Q be the set of primes de- 
fined in Proposition [29.5} A simple application of Theorem [[8.I1{a) (with 
the required level of distribution supplied by the Bombieri- Vinogradov the- 
orem) implies that 


A 1 3Cz 
IO —7 II ( —) log z 


pEP 


Now, from the above discussion, we know there are congruence classes 
ay (modp) with p € [1,Y] U (z/2,z] that cover all of [1,H] MZ, except 
perhaps the set R :={n< H: Pt(n) < y}U QU Qo, where OQ} ={q< 
z:(q-—1,P) = 1} and Q2 C OD has < |Q|/(4c) ~ z/(4logz) elements. 
Since U(H,y) < H/(log H)?+°? by Theorem [16-4] and |Q;| = o(z/log z) 
by Theorem [I8.1IKa) (see also Exercise 20.3(a)), we conclude that |R| < 
z/(3logz) < #{Y < p< z/2} as long as z is large enough. Hence, arguing 
as in Stage 3 of the previous section, we may trivially cover R using residues 
Gp (mod p) with p € (Y, z/2]. This completes the proof of Theorem [29.3 


Random covering systems of congruences 


We now turn to the proof of Proposition The main idea is to construct 
for each p € (z/2, z] a probability measure 5, on Z/pZ such that 


(29.6) ss dp(q) > —log(e/100) for all primes q € Q. 


z/2<pK<z 


Random covering systems of congruences 323 


Before explaining how to do this, let us see how such a construction yields 
Proposition [29.5 


Naturally, the measures 6, with p € (z/2, z] induce a product measure 6 
on the space G := [| Z/pZ by taking 


II dp (Gp) 


z/2<p<z 


z/2<p<z 


for each a = (dp) z/2<p<z © G. Given any g € Q, the probability that it is 


not covered by a random tuple a € G equals |], /5-,<,(1 — dp(q)). Thus 
cl#{a<Q:a¢ LU ap(modp) => TT aa 
z/2<p<z q<Q z/2<p<z 


Hence, and the inequality 1— x < e~* imply that the right-hand side 
is < e|Q|/100. In particular, there must exist a choice of a € G such that 
the number of g € Q not covered by a is < €|Q|/100, which is precisely what 
we claimed in Proposition 29-5] 


To construct measures 6, satisfying (29.6), we go back to the ideas of 
Chapter [28} we set 


8, =0 and $j = Pr(Cy)4+j—-1 II D for 7 = 2,...,k, 
D<Ch 
where C; is an auxiliary integer that will be taken to be large enough and, 
as usual, p,, denotes the nth prime. In particular, if C; > k, then the k-tuple 
(s1,..., 8%) is admissible. We also set 


v(d) = #{n(modd) : (n— 81)---(n — sp) = 0 (mod d) }. 


Next, we consider two upper ea sieve weights. We let re be the 
function \* supplied by Theorem [19.1] when applied with « = k, set of 
primes {p < log z} and level of phere D, = y'°834. We also let pe 
be the function \* from Theorem [19.1] applied with « = 2k, set of primes 

= {log z <p < y} and level of distribution Dy = y!°83 4 

In addition, we let \ : N° > R be the function constructed in Chapter 

28] with z in place of N. In particular, \ is supported on the set 


D = {d EN: dy-s-dy <D, Po (dj) > y Vi} with =D = zi/4e-ve8? 
J 


and it has bounded supremum norm. Note that log D/ log y = log, H/ log; H. 
Hence, if € is as in Lemma[28.3] Cp is as in Lemma[28.4] and Ip(f) and J(f) 
are as is in Lemma[28.6] then . satisfies the asymptotic estimates 


(29.7) S~ G(a1,--- ax)" _ J(f) + O( (logs H)*/ log, H) 


la eg 017 ~ (log D)F TT ,<,(1 — 1/p)* 


324 29. Large gaps between primes 


and 


3 Ge(ai,---,4n)* _ Ie(f) + O((logs H)?/ logs H) 
a1-++* Az (og DF [Ia (l= 1/2) 


(29.8) 


(a1,.-.,€n)ED 
ag= 


for 2=1,...,k by adapting the proof of Lemma (the only difference is 
that we must apply Lemma with w = y and u log, H/ log; H). 
Finally, we let 


Q(n,n’) = II (n— syn’), 


1<j<k 


which is the homogeneous version of the polynomial iare: — s;), and we 
introduce the sieve weights 


m=( to) Yo wo) d na). 


a|Q(n,p) b1Q(n,p)Q(n—1,p) dj [ns Pid 
SIS 


The probability measure 6, is then defined by 


in<2H, n=a (mod p) Wn,p 


dp(a) = Sie et 


for each a € Z/pZ. It is designed to be biased towards the progressions 
a(modp) containing many elements of Q. In particular, note that if n = 
q+ pse with q € Q, then the sum over (d1,...,d,) in the definition of \ can 
be restricted to those k-tuples with de = 1, analogously to the situation in 


Lemma 28.4] 

Because of this nature of the weights 6,, we ignore all summands but 
those of the form q+ psg in the numerator. This yields the inequality 
dp(q) = a Wetpsy/ yin<2H Wn,p, Whence 

k 


ay <2 W +ps 9. 
(29.9) S~> dp(q) > SSE 


W 
2/2<p<z =1 en<2H np 


Calculations 


The next step is to estimate the right-hand side of by adapting the 
methods of Chapter This is accomplished in Lemmas and 
below. A very good (albeit tough) exercise on the methods on Chapter 
is to demonstrate these lemmas without consulting their proofs. 


All implied constants from now on might depend on k. 


Calculations 325 


Lemma 29.6. Assume the above notation and let po € (z/2,z]. Then 


Sow - (J(f) + O(1/./log, #)) 


(og Die (l=—l/p)e * 


n<2H 
where V = (logy z/ log y)?* Mezice g(t = @)/»)- 


Proof. Let d,e € Y be such that dj,e;|n + pos; and po { dje; for each 
j. Arguing as in the beginning of the proof of Lemma [28.3] we find that 
(d;e;,d;e;) = 1 for all i ¢ j, provided that y > sz — s1 = sz. Therefore 


DE vam = Def (@) uF (0)A(A)A(€)N (a, 6, 4, €), 
n<x<2H a,b,d,e 
(die: pode; )=1 Vig 
where N(a,b,d,e) denotes the cardinality of integers n € [1,2H] such that 
a|Q(n, po), b|Q(n, po)Q(n ~~ 1, po) and [d;, ey] |r — Pos; for all q: 
By our assumptions on the support of a ee and A, the numbers a, b and 


[d,e1],.--,[dx, ex] can be assumed to be mutually coprime. The Chinese 
Remainder Theorem then implies that 
V1 (a)v2(b) 


N(a,b,d,e) = 2H - + O(1(a)v2(b)), 


ab TT 114;,5] 


where 1;(a) counts the number of solutions n € Z/aZ to the congruence 
Q(n, po) = 0(moda) and, similarly, v2(b) counts the number of solutions 
n € Z/bZ to the congruence Q(n, po)Q(n—1, po) = 0 (mod b). (In particular, 
these functions might depend on po.) Consequently, 


S> wnpo = Viva S + O(H"), 
n<2H 


HL; A(d)A(e) 
V;= J and Se — 
, os a 2 Thi -a[d;, e;| 


(die; ,podje; )=1 Vij 


We now estimate S. Firstly, we remove the condition from its summands 
that po { die, ---d,ex. The error produced is 


230 yy WAM ¢ oss _ os 
j=l djeEP TE oO Po z 
poldje; 


326 29. Large gaps between primes 


The remaining sum over d and e is estimated as in Lemma [28.3] as well as 
using (29.7). In conclusion, 


= Ea)? 4 ( (log H)**\ _ J(f) + O((logs H)?/ log, H) 
se Seer za y = (log D)* TI ,<y(1—1/p)* | 


Finally, we estimate V; and V2 using Theorem If we let ¢ = 
1/ logy H, then we have 


Vi=(1+0(e)) [I (1-42), Vo=(1+0(e)) [I (1-22), 


<log z log z<p<y a 
pSiog 


acY 


Since p # po when p < log z, we have 11(p) = v(p). On the other hand, we 
have v2(p) = 2k, unless p|po(si — s;)(po(si — sj) + 1) for some i 7. There 
are O(log z/ log, z) such p by Exercise [2.8{c). Therefore, 
Vo = (1 + O(€))(1 + O(1/log 2) 08/1827) TT (1 — 2k/p) 
log z<p<y 

(29.10) = (1 + O(e))(logy z/ log y)”*, 

where we used Mertens’ third estimate (Theorem [3.4{c)). This completes 
the proof of the lemma. 


Lemma 29.7. For each ¢ € {1,...,k} and each qo € Q, we have 


se a _ Vzlogy Le(f) +OU/r/logy H) 
forpeeP “Blogyz (log D)* T],<,(1 — 1/p)* 


2/2<p<Kz 
with V as in Lemma 29.6 


Proof. To ease the notation, we consider the case € = 1, in which case 
s, = 0. The other values of @ are treated similarly. 


Since go > z, any integer d, < D that divides gg must equal 1. Hence, 


mo=(O Kea) YO wal v Na). 


a|Go(p) b|Go(p)Gi(p) dj |go- si, pid; 


be 
where we have set 


Gr(n) = [] (o—h-s;n). 


2<j<k 
As in the proof of Lemma [29.6] we have 
dS Yor= YE wr (a)H3 (B)A(A)ACe) - P(a,b, de), 
z/2<p<z a,b,d,e, dj=e,=1 


(dje;,dje;)=1 ViFj 


where P(a,b,d,e) counts the number of primes p € (z/2,z] such that p { 
dye, aye -dper, a|Go(p), b|Go(p)Gi(p) and [d;, e;]|Go = S$5P for all J > 2: The 


Calculations 327 


number of primes p dividing one of di, €1, ..., dx, ex is O(log z). Hence, 
adapting the argument leading to (28.14) implies that 
Vi (a)v3 (b).X 


P(a,b,d,e) = ae 


+ O(v; (a)¥9 (0) E(z; 17) + log z), 


where r = ab Is), ej], the functions vf{(m) and v3(m) count the num- 
ber of solutions t € (Z/mZ)* to the congruences Go(t) = 0(modm) and 
Go(t)Gi(t) = 0(modm), respectively, 

X =li(z) —li(z/2) = z/(2log x) + O(z/(log z)”), 
and 


Blair) = max, |r(zs7,t) — w(z/2575 8) — X/p(r)|. 


Together with the Bombieri- Vinogradov theorem, this implies that 
a Wa+psep = Vp Vo XT + Oa(z/(log x) ) 


2/2<pKz 


for any fixed A > 0, where 


t(m)v*(m d 
ai) pe MOM) 
2 om) Se aT) 
(djei,dje;)=1 Vij 


Next, we estimate the sum over d and e exactly as in the proof of Lemma 
[28.4] and then use ‘a This yields that 


sl (logH)**\ — hi(f) + O(log; H)?/ logy H) 
2 


acy ~ (log DA“ Tpy<y(t —1/p)V 
aj=1 


As a consequence, 


1) + O( (logs H)?/ logs H) 
Do Wermsen = VIVE 306g DIF pe, = 1/P)=" 


2/2<p<z 
HL (321): 


where ¢ = 1/log, H. We note that v{(p) = v(p) — 1 for all p < logz < q@. 
Indeed, vj (p) equals #{n € (Z/pZ)* : ds; # O(modp), sjn = qo (modp) }, 
which also equals v(p) — 1 because s; = 0. We thus conclude that 


HOP) 


p<log z 


Finally, we apply Theorem to deduce that 


Vi =(1+0(e)) T] (1-2), Vet = (1+ O(c) 


1 


p<log z 7 z<p<y 


328 29. Large gaps between primes 


Moreover, we have v3(p) = 2k — 2, unless p|qo(qo — 1)s1--- sx or pi(si — 
8;)(qo(si — 8;) + sj) for some 7 4 j. Arguing as in (29.10), we find that 


Vi = (1+ O(1/log H))(P2*)" TT (1-2). 


logy log z<p<y y 


This completes the proof of the lemma. 


a are now ready to prove . We estimate the right-hand side of 
using Spina ea Assuming that J i f) > 1, we infer that 


zlogy ee Ii(f) 
3 6p(q) > (1 + o(1)) 16H log, 2 4 ee ~ 48¢ ae J(f) 


z/2<p<z 


for all gq € Q as z 4 oo. Choosing the function f as in Chapter we 
find that the sum over @ is > logk — 4loglogk + O(1). Taking k to be 
large enough in terms of c and € completes the proof of (29.6), and hence of 
Theorem 29.3 


Exercises 
Exercise 29.1. For each fixed » > 0, it is believed that 


erg 
#{ Pn S ©? Pnt1— Pn > Alga} ~ 5 
og & 


(a — oo). 


Use Cramér’s model to give evidence in support of this conjecture. 


Exercise 29.2 (Granville [64]). Let (Y,,)°2, be the Cramér-Granville model of 
parameter y, as defined in (17.13). Let E,,(A) be the event that Y; = 0 for all 
integers j € (k,k + Alog’ k]. In addition, let M = P(y)Ueslos PJ, 
(a) Let A > 0 be fixed. Prove that 
P(Hx()) = ROM? 2400 
for k = P(y),2P(y),...,MP(y) as y > oo. 


(b) If 0 < A < 2e77, show that the probability that none of the events of part (a) 
occurs is o(1) as y > co. 


Chapter 30 


Irregularities in the 
distribution of primes 


So far we concentrated our efforts on proving that the primes behave in 
the “expected way”. In this last chapter, we will show that their distribution 
has subtle irregularities that can be seen when zooming in on certain short 
intervals. 


As we discussed in Chapter 29] we expect that pn41— pn = O(log? n). 
Therefore, it seems reasonable to expect that the interval (x, x+y] contains 
the expected number of primes, namely ~ y/ log a, as soon as y > (log v)?**. 
Assuming the validity of the Riemann Hypothesis, Selberg proved in 
1943 that this is indeed true for almost all x. 


Theorem 30.1 (Selberg). Assume that the Riemann Hypothesis is true. 
Fize >0 and6>0. For all but ox,.(X) integers x € [2, X], we have 


y 


2+06 
log x ; 


(1—«) =e <am(a+y)—m(a2) < (+e) with y = (logz) 


However, in 1985 Maier arrived at the groundbreaking conclusion 
that the asymptotic formula m(a + y) — 1(x) ~ y/ log = fails infinitely often 
for y a fixed but arbitrarily large power of log x. 


Theorem 30.2 (Maier). For every fired C > 1, we have that 


M(x + (log2)%) — a(x) n(x + (log x)°) — m(z) 
raed (log z)C-1 “o amen (log a@)je-* 


The goal of this chapter is to establish Maier’s theorem. On the contrary, 
we will not show Selberg’s theorem because its proof lies beyond the scope 
of this book. 


329 


330 30. Irregularities in the distribution of primes 


Maier matrices 


The starting point for proving Theorem is the observation we made 
in Chapter that Cramér’s model has to be adjusted by presieving the 
integers under consideration with small primes. Indeed, if (logx)? < h < 2, 
then the more accurate Cramér-Granville model suggests that 


#{a<n<gat+h:(n,P(w)) =1} 1 
log x hit=2 


#{e<p<ath}~ 
pKw 

OREY ela cn sath: (n,P(w)) =1} 
~ ——— -#{r<nKX<e : (n, P(w)) = 
log x 
with w a slowly growing function of x. We strategically choose x = P(w). In 
this case, we have w ~ log. In addition, the Chinese Remainder Theorem 
implies that the number of integers in [z+1,2+h] that are coprime to P(w) 
is exactly equal to the number of integers in [1,h] that are coprime to P(w). 
In particular, if h = (log x)” with u, then 

h- Bu 
#{e<nge+h: (n,Piw))yH1}~ a 
as x — oo, where B is Buchstab’s function. Putting everything together, 
we arrive at the guess that 
h 

log x 
However, as we saw in Exercise [14.11] the difference B(u) — e~7? changes 
signs infinitely often (but with the amplitude of its oscillations tending to 0). 
Hence, we may choose u arbitrarily large such that the number of primes in 
(x, x+h] is a bit larger than expected. Similarly, we can also find arbitrarily 
large u for which the number of primes in (x, +h] is smaller than expected. 


{et <p<cath}~e'B(u)- 


In order to make the above heuristic rigorous, Maier averaged over many 
intervals (x, +h] with x a multiple of P(w). It is convenient to display 
these intervals in the form of a matrix. To this end, given positive integers 
k, 2, q and h, with 0 > 2k and h < q, we define the Maier matriz 


1+(k+1l)q 2+(k4+1)q --- h+(k+1)q 

1+(k+2)q 2+(k+2)q --- h+(k+2)q 

Meeteqny ae | PE HE HEH (e+) 
1+ eq 2+ lq ee h+ éq 


We will eventually take ¢ = P(w) for some convenient choice of w. 


lWe have presented matters in reverse chronological order. Granville’s modification of 
Cramér’s model was inspired by Maier’s work on irregularities in the distribution of primes. 


Calibrating the parameters 331 


Note that the ith row of M(k, £;q,h) contains all integers in the short 
interval ((k + 7)q,h+(k+i)q], whereas its jth column contains all integers 
in the arithmetic progression 


(30.1) {n=j(modgq):jtkq<n<j+q}. 


Now, if each arithmetic progression of the form (80.1) contains the ex- 
pected number of primes, and we write p © M(k,¢;q,h) to denote that p 
appears among the entries of M(k, ¢;q,h), then 


j+eq 
HpEM(k.Gan}~ Sf >ye 


io (4) Jj+kq logt neath; ) 15 
(j,q)=1 (Gi,q)= 


On the other hand, if all short intervals (mq,h + mq] with m € (k, ¢] 
contain the expected proportion of primes, then 


7 h 7 (£—k)h 
HpeM(Gah}~ D0 isting) ~ Toslaty 


Comparing the above estimates, we see that if we can find a sequence of 
q and A going to infinity in a that 
#Hi1l<j<h: 7g =1}3 
oe 9(q) 


then we obtain a contradiction. A similar conclusion holds if the left-hand 
side can be made < c’ < 1 for an infinite sequence of g and h. 


(30.2) Seo 


Calibrating the parameters 


As we explained above, we will let g = P(w), so that w ~ log q and q/¢(q) ~ 
e’logw, as well as h = w” with u to be chosen later in a way that the 
difference B(u)—e~7 has a predetermined sign. In order to deduce Theorem 
30.2] we need to be able to show that each arithmetic progression of the 
form contains the expected number of primes. We will take k = q? 
and ¢ = q" for some large L. Theorem [12.]] cannot be used in this range. 
However, we can apply Theorem (so we will eventually weigh primes 
logarithmically). Firstly, we claim that we may choose w in such a way 
that rules out the existence of the exceptional character .; for an infinite 
sequence of moduli of the form g = P(w). 


Lemma 30.3. There are infinitely many w € N such that if q = P(w), then 


1 log(log z/ logy) + O(1) 
d 9(q) 


y<pKz 
p=a (mod q) 


uniformly forz >y>q@ anda€ (Z/qZ)*. 


332 30. Irregularities in the distribution of primes 


Proof. For each ¢ > 3, we let Rg denote the set of real non-principal char- 
acters. In view of Theorem [27.1] it suffices to show we can choose infinitely 
many values of g = P(w 7 sich that 


(30.3) > x(p) (22y2a, X¥ © Re): 


Y<pKz 


Recall the definition of the sifted L-function Ly(s, y) from Chapter 22] 
If Qy = aeaat then Theorem implies that 


(30.4) yw (z>y > Qy, XE Ry). 


Y<pKz 
Now, let w € N. We will show that either (80.3) holds for g = P(w), or 
we can find w’ > w such that (80.3) holds for gq’ = P(w’). 


Fix an auxiliary large constant M to be chosen later, and let g = P(w). 
Firstly, we consider the case when L,(1, x) > 1/M? for all y € Rg. Since 


Depcgee 1/p = O(log M), (80.3) follows from (80.4) in this case, with the 
implicit constant depending on M. 


Assume now there is some x1 € Rg such that Lg(1, x1) < 1/M?. We will 
show that if M is sufficiently large, then we may construct another modulus 
qd = P(w’) satisfying (0.3). Precisely, we take w! = M~!log Q,,, so that 
logq’ ~ M~1 log Q,,. Note that w’ < co by Theorem [12.8 


Consider x € Ry. If x is induced by y1, then the fact that loggq’ ~ 
M~! log . and (30.4) imply that 


~ x(p) » x) | O(log M) = Oy (1) (z>y>q). 
Y<PSZ max{Qy, ,y}<p<z 


Assume now that x is not induced by x,. We then know from Theorem 
b) that there is an absolute constant c > 0 such that 


max{L,(1, x); De, x1)} ZC 
On the other hand, we claim that if M is large enough, then Ly (1,1) < ¢ 
Indeed, for allo =1+1/logz with x > we have 


logLy(o,x)= >~ x1(P) (1) 


q' <pxa 


by Lemma Applying (22.16) from Theorem [22.5] followed by (22.15), 


we can rewrite the right side of the above formula as 


logLy(o,x)= >> xP). ogy = » ERO. 


Pp 
q' <p<Qy, q' <Pp<Qy, 


Calibrating the parameters 333 


Hence, Mertens estimate implies that log Ly (o, x) = —log M + O(1). Let- 
ting « — oo implies that Ly(1,x1) < 1/M, so that Lg (1,x1) < ¢ by 
choosing a sufficiently large M. Hence, for this choice of M, we have 
Ly(1,x) = ¢ > 0. Combining this relation with proves that 
holds for x in this case too. This completes the proof of the lemma. 


Proof of Theorem Let g = P(w) be as in LemmaJ30.3] In addition, 
let h = w", and let L be a large constant to be chosen later. For brevity, we 
write M to denote the Maier matrix M(q?,q";q, h). 


One the one hand, Lemma|[30.3] implies that 
Yt= O SY Gey See 
p a P : oq) 
peM l<jxh  j+q?<p<jtq't! 1<j<h 
(,P(w))=1 p=j (mod q) (j,P(w))=1 
Hence, applying Theorem [14.4] and noticing that q/p(q) = e7 logw + O(1) 
by Mertens’s third estimate (Theorem B.4(c)), we conclude that 
1 h 
(30.5) > —- =e7’B(u)- —- (1+ O,(1/ log w)) - (log L + O(1)). 
pEeM q 
On the other hand, for each fixed u and for w — oo, we have 


(30.6) So - = S> io 3 mh ¥ ma) — anima) 


pEeM g2<m<ql mq<p<h+mq P q2@<m<qt 


since g/h — oo when w — oo. 
Now, we select u such that e7B(u) > 1 and set 6 = e7B(u) —1 > 0. We 
have l 
S> ie log L + O(1) 
g2<m<qh 
by partial summation. Together with (80.5), this implies that there are 
constants Lo = Lo(d) and wo = wo(d, wu) such that 


(30.7) S52 14+4/2) S- h/log(mq) 


mq 
pEeM ge<m<qt 


whenever L > Lo and w > wo. From now on, we suppose that L = Lo, so 
that LZ is fixed in terms of 6. 


Comparing (30.7) with (30.6), and assuming w is large enough in terms 
of 6 and u, we find that there must exist some m € ZN (q?,q"] such that 


h 
log(mq) 
Recall that h = w“, so that h ~ (logg)" Xu,5 (log(mq))” for all m € 
(7, q’). Hence, if 1 < C < u—1 and we let w be large enough in terms of 6 


(30.8) w(h + mq) — m(mq) > (1+ 6/3) 


334 30. Irregularities in the distribution of primes 


and u, then and the pigeonhole principle imply the existence of some 
x € [mq,h + mq] such that 1(x + (log x)°) — a(x) > (1+ 6/4) (log x)". 

The above discussion proves the rightmost inequality in Theorem [0.2 
for1<C <u-—l. Since u can be taken to be arbitrarily large by Exercise 
14.1i{e), we have established that for each fixed C > 1, the limsup of 
(a(x + (log 2)°—!) — a(x))/(logx)°—! is > 1 as & > 00. 

An obvious modification of the above argument, where we work with a 
sequence of u for which B(u) < e~7, proves that the leftmost inequality in 
the statement of Theorem [30.2]is also true for all C > 1. This concludes the 
proof of Maier’s theorem. 


Exercises 


Exercise 30.1 (Banks-Ford-Tao [6]). For y > 3 and u > 2, let 


_ #{s<n<sty":(n, Ply) =1} 
BY(y,u) = EN y"/ logy 


and 


< “= (n, P(y)) =1 
ain #is<n<sty": (n, Ply) =1h 
scN y"/ log y 
(a) Let 2<v <u. Show that Bt(y,u) < (1 + oy400(1))8t(y, v) and B-(y,u) > 
(1 + 0y-400(1))8™ (y, v). [Hint: Use the pigeonhole principle.] 

(b) Give a heuristic argument that justifies the following claims: 

(i) maxp, <r(Pn41 — Pn) ~ (log.2)?/[e7-B(loge,2)] as e+ 00. 

(ii) Fix u > 2. If X > oo, then 

mae m(x + (log x)") — r(x) 
X1/ logos X <a X (log z)*-? 


(30.9) ~ e7B* (log X,u) 


and 
1] uy 
(30.10) ny eee ae 
X1/log log X <e<X (log geet 
(c) Assume that for all uw > 2 and < > 0, there are yg > 3 and 6 > O such that 
[B= (y',u’) — BX (y,u)| < € when | log(y’/y)| <4, y > yo and |u! — ul <4. 
Prove that the ratio of the left over the right side of (80.9) is > 1+ 0(1), 
and that the ratio of the left over the right side of (80.10) is < 1+ o0(1). 


~ e’B (log X, u). 


Appendices 


Appendix A 


The Riemann-Stieltjes 
integral 


The Riemann-Stieltjes integral is a generalization of the Riemann inte- 
gral that is very useful in analytic number theory, because it allows us to 
transform discrete sums into integrals and thus easily manipulate them using 
our intuition from integral calculus. We present here the basic definitions 
and properties of this theory following the treatment in [38} Chapter 7]. The 
basic theory is also presented in Chapter 6] and Appendix A]. 

Consider two functions f,a : [a,b] > R, a partition P = {xo,71,...,2n} 
of [a,b], and a selection of points € = {&,...,€} with € € [x;-1,2,] for 
each 7. We then define the Riemann-Stieltjes sum of f with respect to a, 
P and € by 


S(f,0;P, €) = ys Ej): = Fai). 
Assume there is a real number J with the property that, given any e« > 0, 
there is a partition Pz such that 
|S PS) 1 <e 


whenever P is a refinement of P- (i.e., P D P-). We then say that f is 
integrable with respect to a (over [a,b]) and write symbolically f € R(a). 
The number IJ is called the Riemann-Stieltjes integral of f with respect to 
a and it is denoted by 


7 i fe [ seoveoe) 


336 


A. The Riemann-Stieltjes integral 337 


The following theorem establishes the needed properties of the Riemann- 
Stieltjes integral for the purposes of this book. Its proof is contained in 
(see Theorems 7.2, 7.3, 7.27, 7.6, 7.8 and 7.11 there, respectively). 
Theorem A.1. Let f,g,a,8: [a,b] > R andA,weER. 

(a) If f,g € Ria), then Af +g € R(a). We further have 


b b b 
/ (Af + wg)da = rf faa +p f gda. 
(b) If fe Rla)NR(G), then f E R(Aa + ZB). We further have 


[faves nsy=>f taatn fsa, 


(c) If f is continuous and a is of bounded variation, then f € R(a). 
(d) If f Ee Ria), thena€e R(f) and 


[feo = fl)a(e)| = - adf. 


(e) If f € R(a) and a is continuously differentiable on [a,b|, then the Rie- 
mann integral f° f(x)a'(x)da exists and we have 


[ seoyo0a) = [Heal 


(f) Assume that a is a step function whose only discontinutties are at the 
finitely many points x1, ..., fn € [a,b], with corresponding jumps Aa; 
= a(a;) — a(z; ). 

Assume further that, at each point x;, at least one of f and a is 
continuous from the right, and at least one of them is continuous from 
the left. 


Then f © R(a) and we have 


b n 
| fda = 5° f(a;)Aay. 
a j=l 


Appendix B 


The Fourier and the 
Mellin transforms 


We write L'(R) for the space of Lebesgue integrable functions f : R > C. 
Given such a function, we define its Fourier transform f : R — C via the 
formula 


(B.1) Fe) = f Flee Prac. 
We then have the Fourier inversion formula (7.16), p. 218]. 


Theorem B.1. If f is continuous and such that f, fe L1(R), then 


f(a) = / FlOe2"*aé. 


The condition that a € L'(R) is not always easy to verify. The simplest 
way to guarantee it is by assuming that f is smooth enough. Indeed, if 
the derivatives f, f’,..., f“ exist, are in L1(R) and tend to 0 at too, then 
integrating by parts j times in yields that 


Pp, 1 j —27riEx 1 
(B.2) fH= a. fO(ae Pde <p Tr. 


In particular, if we can take 7 = 2, then f € L1(R), so the hypotheses of 
Theorem [B. I] are met. 


Sometimes, we want to know that the Fourier inversion formula holds 
under even weaker conditions. Such a result is provided by the following 
theorem [45} Theorem 7.6, p. 220]. 


338 


B. The Fourier and the Mellin transforms 339 


Theorem B.2. Let f : R- C be piecewise continuously differentiable and 
Lebesgue integrable over R. For each x € R, we have 


f(a") oe fae _ b= im fo at je? Ede, 


Finally, a very useful property of the Fourier transform is the Poisson 
summation formula. This formula states that 
(B.3) Y= f(r) = 35 fm) 

neZ neZ 

for all “nice” functions f : R — C. There are various ways to define what 
we mean by “nice”. An easy way is to assume that f is continuously differ- 
entiable twice and that we have f(x) < 1/2? for j € {1,2} and |z| > 1, 
that is to say, f, f’ and f” decay at infinity at least as fast as the inverse 
of a quadratic polynomial. We then use relation to obtain the bound 
f(€) < 1/€ for |é| > 1. In particular, both sides of are well-defined. 

In order to prove (B.3), we define g(x) = 0 ,¢z f(x +n), which is a 
1-periodic function in C?(IR). Thus, Theorem 2.1 in [45] p. 35] implies that 


= . Ee, 


meZ 
where Cm = fo g(x je Ae de: We then note that 
ice, So f(@+n) emits =O f(x +n) jee de 
neZ neZ 


by Lebesgue’s Dominated Convergence Theorem, since f(a+n) < 1/(1+n?) 
for all x € [0,1] and all n € Z. Setting y = x +n and noticing that 


e2rmn — 1 we conclude that 
em = 3 f Seyye*ay = Flom 
neZ 
Consequently 


S f(a@t+n) = g(x => f(m) jee, 


neZ meZ 
Taking x = 0 proves (B.3). To sum up, we have shown the following result. 


Theorem B.3. Let f € C?(R). Assume further that f(x) « 1/x? for 
3 € {0,1,2} and |x| >1. Then f(€) « 1/€? for |€| >1 and 

DY fr) = YF Fr) 

neZ neZ 


1This means that f and f’ are piecewise continuous over R, that is to say, they both have a 
discrete set of discontinuities (i.e., with no accumulation points) that are all of the first kind (i.e., 
“jump discontinuities” ). 


340 B. The Fourier and the Mellin transforms 


The Mellin transform 


Given a function g : Ryo — C, we define its Mellin transform to be 


(B.4) G(s) = [PO alata 


for all s € C that this integral converges. An important example of a Mellin 
transform is the Gamma function that we studied at the end of Chapter 


The Mellin transform is a close relative of the Fourier transform. Indeed, 
if we make a change of variables x = e“, we immediately see that G(s) = 
h(—s/2mi) with h(w) = g(e“). In particular, we have the Mellin inversion 
formula as a consequence of Fourier inversion: 


Theorem B.4. Let g and G be as above, with g piecewise continuously 
differentiable. Assume further that there are a, < a2 such that the function 
x — |g(x)|x°—+ is in L1(Rso) for all o € (ay, a2). 

Then G is a holomorphic function in the strip ay < Re(s) < ag. In 
addition, the inversion formula 


g(a*) + g(a 


) _— 1 Ss — 1 = ome 
2 =f dary aes — Gales 
(a) [Im(s)|<T 


holds for all x > 0 and all a € (a4, a2). 


Proof. Let ¢ be positive and smaller than (a2 — a;)/2. By the hypotheses 
of the theorem, the integrals 


1 oo 
| \g(x) |x tel da and / \g(x)|x°2-© da 
0 1 


converge. Hence, the integral defining G(s) converges absolutely and uni- 
formly in the strip ay +¢ < Re(s) < ag—e. The holomorphicity of 
G then follows. For the Mellin inversion formula, note that the function 
€ + G(a — 277€) is the Fourier transform of the function u > g(e“)e. 
Applying Theorem [B.2] completes the proof. 


Appendix C 


The method of 
moments 


We prove here a generalized theorem of Theorem[I5.2] which is the main 
probabilistic tool needed in the proof of the Erdés-Kac theorem. 

Given a constant c > 0 and a random variable X (defined on some am- 
bient probability space), we write X € E(c) if P(|X| >u)<e-™ uniformly 
for u > 0. In addition, we write X € E(co) if X € E(c) for each fixed c > 0. 
Clearly, the standard normal distribution is in the class E(oo), as well as 
any compactly supported distribution. 


We will prove the following generalization of Theorem [15.2 
Theorem C.1. Let X be a random variable in the class E(co), and let 
(Xj)F21 be a sequence of random variables. 
(a) Assume that 


(C.1) lim E[X*] = E[X*] for allk EN. 


j-co 


Then (Xj)7<, converges in distribution to X. 


(b) Conversely, assume that (Xj)? converges in distribution to X. If, in 


addition, sup;s1 ELX?*) < oo for allk EN, then (C.1) holds. 


Before we embark on the proof of Theorem|C.1} we make a few remarks. 


Remark C.2. (a) The condition that X € E(c) is closely related to having 
that E[|X|*] < k!/c* uniformly for k € Zs1. Indeed, if X € E(c), then 


(C.2) E[|X|*]= i ku 'P(|X| > u)du < | uk te-%du = kl/c*. 


341 


342 C. The method of moments 


Conversely, if the moments of X satisfy the uniform bound E||X|*] < k!/c*, 
then Markov’s inequality implies that 

P(|X| > u) < u-*E[|X|*] « (cu)~*k! x Vk - e7*(k/cu)® 
for all k € Zs. Taking k = [cu] + 1 proves that P(|X| > u) <c ul/2e— 
for all u > 1. In particular, X € E(c’) for all ¢ <c. 


(b) Theorem[C.1]holds even if X € E(c) with 0 < ¢ < co. (See Theorems 
29.3 and 30.1 in Billingsley’s book [7].) 


Proof of Theorem [C.1} We start with part (b) that is simpler. Conver- 
gence in distribution is equivalent to weak convergence, that is to say, 


(C.3) lim E[f(Xj)] = E[f(X)] 


j-co 


for any bounded f € C(R) [7 Theorem 25.8]. Hence, if we let ¢(x) = 1 for 
|x| <1, and ¢(x) = max{0, 2 — |z|} for |z| > 1, then 


lim E[X*¢(X;/M)] = ELX*¢(X/M)] 


j-co 


for any M > 0. In addition, we have 
B[X#] — E[XF4(X,/M)]| < EllXjl*t,x,)5a1] < M~*E[X?4] <p 1/M 


uniformly in 7 € Zs; and M > 1. Similarly, ELX*¢(X/M)| = E[X*] + 
O;(1/M). We thus infer the validity of (C.1). 


We now prove part (a), where we assume that holds. It suffices to 
prove that if f is a smooth function supported on [a, b], then holds. We 
will employ an explicit version of Weierstrass’s approximation theorem using 
Chebyshev polynomials of the first kind, defined by T;,(cos@) = cos(n6). 
Using the formula e’”? = cos@ + isin6, we may easily deduce that 


0<j<n/2 J 
In particular, we have 
(C.4) [T(2)| <2"a" < 2%"? for |x| > 1. 
Now, fix M to be a large enough parameter so that [a,b] C (-M, M) 


and consider the function a + f(M cos(27a)), which is 1-periodic, even and 
smooth. We may thus develop it in its Fourier series, say 


f(M cos(27a)) = a An,M Cos(27na), 
n>0 


where ; 
Qn,M = (1+ Ino) f f(M cos(27a@)) cos(27na)da. 
0 


C. The method of moments 343 


Integrating by parts twice, we find that 


1+ 1nso [' 
| gm (a) cos(27na)da, 

0 
where gy(a) = M? sin?(27a) f"(M cos(27a)) — M cos(27a) f’(M cos(27a)). 
Since f has bounded support, gy is supported on a € [0,1] such that 
cos(27a) = O-(1/M) (we think of M as big in terms of a and b). This set 
has measure Of(1/M). Thus an, = Of(M/n?). We conclude that 


(C.5) f(M cos(27a)) = > dn,m cos(27na) + OF (e) 
O0<n<M/e 
uniformly for M >landO<e<l. 


We use (C.5) to write f in terms of the Chebyshev polynomials. If 
|x| < M, then x = M cos(27q) for some a € [0,1] and thus 


(C6) jO= > aumei +00. 
0<n<M /e 


On the other hand, when |x| > M, the left-hand side of (C.6) is 0, whereas 
the right-hand side is < f E+ Vocnem/e(2a/M)?"t? by (G4) and the trivial 
bound an, = Of(1). This proves that for all x € R we have 


f@= an, MT (2/M) + Os (€ + - (22/M)*"*?). 


0<n<M/e 0<n<M/e 


Aan,M = = 


Applying the above formula twice, we deduce that 


ELA(X)]-EU(X)}= > ana (E[Tn(X;/M)] - E[T,(X/M))) 
0<n<M/e 

+0 (c + So (ayaey?nt? (epx?et?) + z[xen?})). 
0<n<M/e 


When 7 > co, the main term goes to 0 by assumption of (C.1). Thus 
(C.7) limsup |E[f(X;)] -E[f(X)||«pe+ S> (2/M)??VPELX?n+?), 
J tQO 0<n<M/e 
Let k =n+1. Since X € €(4), we use (C.2) with 2k in place of k to find 
S> (2/M)*ELX**] «x SO (k/M)y**¥< SO M* <1/M. 
1<k<VM 1<k<VM 1<k<VM 
For larger k we use that X € €(4e~*/?). Hence, 
S> (2/M)FE[X™*) <. SS (63/7 /M)?*# « eV™, 

VM<k<M/e VM<k<M/e 
Thus, letting M — oo in (C.7) yields limsup,_,,, |E[f(X;)]-E[f(X)]| «ye. 
Finally, we let ¢ > 0* to deduce (C.3). This completes the proof. 


Bibliography 


M. Agrawal, N. Kayal, and N. Saxena, PRIMES is in P, Ann. of Math. (2) 160 (2004), 
no. 2, 781-793, DOI 10.4007/annals.2004.160.781. MR2123939 


L. V. Ahlfors, Complex analysis: An introduction to the theory of analytic functions of one 
complex variable, 3rd ed., International Series in Pure and Applied Mathematics, McGraw- 
Hill Book Co., New York, 1978. MR510197 


T. M. Apostol, Mathematical analysis, 2nd ed., Addison-Wesley Publishing Co., Reading, 
Mass.-London-Don Mills, Ont., 1974. MR0344384 


T. M. Apostol, Introduction to analytic number theory, Undergraduate Texts in Mathemat- 
ics, Springer-Verlag, New York-Heidelberg, 1976. MR0434929 


S. Axler, Linear algebra done right, 2nd ed., Undergraduate Texts in Mathematics, Springer- 
Verlag, New York, 1997. MR1482226 


W. Banks, K. Ford and T. Tao, Large prime gaps and probabilistic models. Preprint (2019), 
38 pages, arXiv:1908.08613. 


P. Billingsley, Probability and measure, 3rd ed., Wiley Series in Probability and Mathemat- 
ical Statistics, John Wiley & Sons, Inc., New York, 1995. A Wiley-Interscience Publication. 


MR1324786 
E. Bombieri, On the large sieve, Mathematika 12 (1965), 201-225, DOI 


10.1112/S0025579300005313. MR0197425 


E. Bombieri, The asymptotic sieve (English, with Italian summary), Rend. Accad. Naz. XL 
(5) 1/2 (1975/76), 243-269 (1977). MR0491570 


E. Bombieri, Le grand crible dans la théorie analytique des nombres (French, with English 
summary), Astérisque 18 (1987), 103. MR891718 


E. Bombieri, J. B. Friedlander, and H. Iwaniec, Primes in arithmetic progressions to large 
moduli, Acta Math. 156 (1986), no. 3-4, 203-251, DOI 10.1007/BF02399204. MR834613 


E. Bombieri, J. B. Friedlander, and H. Iwaniec, Primes in arithmetic progressions to large 
moduli. I, Math. Ann. 277 (1987), no. 3, 361-393, DOI 10.1007/BF01458321. MR891581 


E. Bombieri, J. B. Friedlander, and H. Iwaniec, Primes in arithmetic progressions to large 
moduli. III, J. Amer. Math. Soc. 2 (1989), no. 2, 215-224, DOI 10.2307/1990976. MR976723 


E. Bombieri and H. Iwaniec, On the order of (4 + it), Ann. Scuola Norm. Sup. Pisa Cl. 
Sci. (4) 13 (1986), no. 3, 449-472. MR881101 


344 


Bibliography 345 


15 


16 


17 


18 


19 


20 


21 


22 


23 


24 


25 


26 


27 


28 
29 
30 


31 


32 


33 


34 


35 


36 


E. Bombieri and H. Iwaniec, Some mean-value theorems for exponential sums, Ann. Scuola 
Norm. Sup. Pisa Cl. Sci. (4) 13 (1986), no. 3, 473-486. MR881102 


J. Bourgain, Decoupling, exponential sums and the Riemann zeta function, J. Amer. Math. 


Soc. 30 (2017), no. 1, 205-224, DOI 10.1090/jams/860. MR3556291 


N. G. de Bruijn, On the number of positive integers < x and free of prime factors > y, 
Nederl. Acad. Wetensch. Proc. Ser. A. 54 (1951), 50-60. MR0046375 


N. G. de Bruijn, On the number of positive integers < x and free prime factors > y. II, 
Nederl. Akad. Wetensch. Proc. Ser. A 69=Indag. Math. 28 (1966), 239-247. MR0205945 


V. Brun, Uber das Goldbachsche Gesetzund die Anzahlder Primzahlpaare, Archiv for Math. 
og Naturvid. 34 (1915), no. 8, 19 pp. 

V. Brun, La série 1/5 + 1/7 +1/114+ 1/134 1/17 + 1/19 + 1/29 + 1/31 + 1/41 + 1/43 4 
1/59+1/61+--- ot les dénominateurs sont “nombres premiers jumeaus” est convergente 
ou finie, Bull. Sci. Math. (2) 43 (1919), 100-104; 124-128. 

V. Brun, Reflections on the sieve of Eratosthenes, Norske Vid. Selsk. Skr. (Trondheim) 1967 


1967), no. 1, 9. MRO219466 


A. A. Buchstab, Asymptotic estimates of a general number-theoretic function (Russian), 
Mat. Sb. (2) 44 (1937), 1239-1246. 


A. A. Buchstab, New improvements in the method of the sieve of E'ratosthenes, Mat. Sb. 
N.S.) 4 46 (1938), 375-387. 


Jing-run Chen, On the representation of a large even integer as the sum of a prime and 
the product of at most two primes, Kexue Tongbao (Foreign Lang. Ed.) 17 (1966), 385-386. 
MR0207668 


J. R. Chen, On the representation of a large even integer as the sum of a prime and the 
product of at most two primes. I, Sci. Sinica 21 (1978), no. 4, 421-430. MR511293 


A. C. Cojocaru and M. R. Murty, An introduction to sieve methods and their applica- 
tions, London Mathematical Society Student Texts, vol. 66, Cambridge University Press, 
Cambridge, 2006. MR2200366 


H. Cramér, Some theorems concerning prime numbers, Arkiv for Mat. Astr. 0. Fys. 15 
(1920), no. 5, 1-32. 


H. Cramér, On the distribution of primes, Proc. Camb. Phil. Soc. 20 (1920), 272-280. 
H. Cramér, Prime numbers and probability, Skand. Mat.-Kongr. 8 (1935), 107-115. 


H. Cramér, On the order of magnitude of the difference between consecutive prime numbers, 
Acta Arith. 2 (1936), 23-46. 


H. Davenport, Multiplicative number theory, 3rd ed., Graduate Texts in Mathematics, 
vol. 74, Springer-Verlag, New York, 2000. Revised and with a preface by Hugh L. Mont- 


gomery. MR1790423 


H. Delange, Sur des formules dues a Atle Selberg (French), Bull. Sci. Math. (2) 83 (1959), 
101-111. MR0113836 


H. G. Diamond and H. Halberstam, A higher-dimensional sieve method, Cambridge Tracts 
in Mathematics, vol. 177, Cambridge University Press, Cambridge, 2008. With an appendix 
(“Procedures for computing sieve functions”) by William F. Galway. MR2458547 


H. Diamond, H. Halberstam, and H.-E. Richert, Combinatorial sieves of dimension exceed- 
ing one, J. Number Theory 28 (1988), no. 3, 306-346, DOI 10.1016/0022-314X(88)90046-7. 


ME982579) 


K. Dickman, On the frequency of numbers containing prime factors of a certain relative 
magnitude, Ark. Mat. Astr. fys. 22 (1930), 1-14. 


A. Elbert, Some recent results on the zeros of Bessel functions and orthogonal polynomials, 
Proceedings of the Fifth International Symposium on Orthogonal Polynomials, Special Func- 
tions and their Applications (Patras, 1999), J. Comput. Appl. Math. 133 (2001), no. 1-2, 
65-83, DOI 10.1016/S0377-0427(00)00635-X. MR1858270. 


346 


Bibliography 


37 


38 


39 


40 


41 


42 


43 


44 


45 


46 


47 


48 


49 


50 


51 


52 


53. 


54 


55 


56 


57 


58 


H. M. Edwards, Riemann’s zeta function, Dover Publications, Inc., Mineola, NY, 2001. 
Reprint of the 1974 original [Academic Press, New York; MR0466039 (57 #5922)]. 
MR1854455 

P. D. T. A. Elliott, Probabilistic number theory. I: Mean-value theorems, Grundlehren der 
Mathematischen Wissenschaften [Fundamental Principles of Mathematical Science], vol. 239, 
Springer-Verlag, New York-Berlin, 1979. MR551361 

P. D. T. A. Elliott, Probabilistic number theory. II: Central limit theorems, Grundlehren 
der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], 
vol. 240, Springer-Verlag, Berlin-New York, 1980. MR560507 

P. D. T. A. Elliott, Multiplicative functions on arithmetic progressions. VII. Large mod- 
uli, J. London Math. Soc. (2) 66 (2002), no. 1, 14-28, DOI 10.1112/S0024610702003228. 
MR1911217 

P. D. T. A. Elliott and H. Halberstam, A conjecture in prime number theory, Symposia 
Mathematica, Vol. IV (INDAM, Rome, 1968/69), Academic Press, London, 1970, pp. 59- 
72. MRO0276195 


P. Erdés, The difference of consecutive primes, Duke Math. J. 6 (1940), 438-441. MR1759 


P. Erdés and M. Kac, The Gaussian law of errors in the theory of additive number theoretic 
functions, Amer. J. Math. 62 (1940), 738-742, DOI 10.2307/2371483. MR0002374 


L. Euler, Commentationes Arithmeticae. V. 3 (Latin), Leonhardi Euleri Opera Omnia (1) 
4, Orell Fiissli, Zurich; B. G. Teubner, Leipzig, 1941. Edited by Rudolf Fueter. MR0006112 


G. B. Folland, Fourier analysis and its applications, The Wadsworth & Brooks/Cole Math- 
ematics Series, Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, CA, 


1992. MR1145236 


K. Ford, B. Green, S. Konyagin, and T. Tao, Large gaps between consecutive prime numbers, 
Ann. of Math. (2) 183 (2016), no. 3, 935-974, DOI 10.4007/annals.2016.183.3.4. MR3488740 


K. Ford, B. Green, $. Konyagin, J. Maynard, and T. Tao, Long gaps between primes, J. 
Amer. Math. Soc. 31 (2018), no. 1, 65-105, DOI 10.1090/jams/876. MR3718451 


K. Ford and H. Halberstam, The Brun-Hooley sieve, J. Number Theory 81 (2000), no. 2, 
335-350, DOI 10.1006/jnth.1999.2479. MR1752258 


E. Fouvry, Répartition des suites dans les progressions arithmétiques (French), Acta Arith. 
Al (1982), no. 4, 359-382, DOI 10.4064/aa-41-4-359-382. MR677549 

E. Fouvry, Autour du théoréme de Bombieri- Vinogradov (French), Acta Math. 152 (1984), 
no. 3-4, 219-244, DOI 10.1007/BF02392198. MR741055 


E. Fouvry and H. Iwaniec, On a theorem of Bombieri- Vinogradov type, Mathematika 27 
(1980), no. 2, 135-152 (1981), DOI 10.1112/S0025579300010032. MR610700 


E. Fouvry and H. Iwaniec, Primes in arithmetic progressions, Acta Arith. 42 (1983), no. 2, 
197-218, DOI 10.4064/aa-42-2-197-218. MR719249 


J. Friedlander and A. Granville, Limitations to the equi-distribution of primes. I, Ann. of 
Math. (2) 129 (1989), no. 2, 363-382, DOI 10.2307/1971450. MR986796 


J. Friedlander and A. Granville, Limitations to the equi-distribution of primes. III, Compo- 
sitio Math. 81 (1992), no. 1, 19-32. MR1145606 


J. Friedlander, A. Granville, A. Hildebrand, and H. Maier, Oscillation theorems for primes 
in arithmetic progressions and for sifting functions, J. Amer. Math. Soc. 4 (1991), no. 1, 


25-86, DOI 10.2307/2939254. MR1080647 

J. Friedlander and H. Iwaniec, On Bombieri’s asymptotic sieve, Ann. Scuola Norm. Sup. 
Pisa Cl. Sci. (4) 5 (1978), no. 4, 719-756. MR519891 

J. Friedlander and H. Iwaniec, Asymptotic sieve for primes, Ann. of Math. (2) 148 (1998), 
no. 3, 1041-1065, DOI 10.2307/121035. MR1670069 


J. Friedlander and H. Iwaniec, The polynomial X? + Y* captures its primes, Ann. of Math. 
(2) 148 (1998), no. 3, 945-1040, DOI 10.2307/121034. MR1670065 


Bibliography 347 


59 


60 


61 


62 


63 


64 


65 


66 


67 


68 


69 


70 


71 


72 


73 


74 


75 


76 


77 


78 


J. Friedlander and H. Iwaniec, Opera de cribro, American Mathematical Society Colloquium 
Publications, vol. 57, American Mathematical Society, Providence, RI, 2010. MR2647984 


D. M. Goldfeld, A simple proof of Siegel’s theorem, Proc. Nat. Acad. Sci. U.S.A. 71 (1974), 
1055, DOI 10.1073/pnas.71.4.1055. MR0344222 


L. Goldmakher, Multiplicative mimicry and improvements to the Pélya- Vinogradov in- 
equality, Algebra Number Theory 6 (2012), no. 1, 123-163, DOI 10.2140/ant.2012.6.123. 
MR2950162 

D. A. Goldston, S. W. Graham, J. Pintz, and C. Y. Yildirim, Small gaps between primes or 
almost primes, Trans. Amer. Math. Soc. 361 (2009), no. 10, 5285-5330, DOI 10.1090/S0002- 
9947-09-04788-6. MR2515812 


D. A. Goldston, J. Pintz, and C. Y. Yildirim, Primes in tuples. I, Ann. of Math. (2) 170 
(2009), no. 2, 819-862, DOI 10.4007/annals.2009.170.819. MR2552109 


A. Granville, Harald Cramér and the distribution of prime numbers, Scand. Actuar. J. 1 
(1995), 12-28, DOI 10.1080/03461238.1995.10413946. Harald Cramér Symposium (Stock- 
holm, 1993). MR1349149 


A. Granville, Primes in intervals of bounded length, Bull. Amer. Math. Soc. (N.S.) 52 (2015), 
no. 2, 171-222, DOI 10.1090/S0273-0979-2015-01480-1. MR3312631 


A. Granville, A. J. Harper, and K. Soundararajan, Mean values of multiplicative functions 
over function fields, Res. Number Theory 1 (2015), Art. 25, 18, DOI 10.1007/s40993-015- 
0023-5. MR3501009 

A. Granville, D. M. Kane, D. Koukoulopoulos, and R. J. Lemke Oliver, Best possible densi- 
ties of Dickson m-tuples, as a consequence of Zhang-Maynard- Tao, Analytic number theory, 
Springer, Cham, 2015, pp. 133-144. MR3467396 

A. Granville and D. Koukoulopoulos, Beyond the LSD method for the partial sums of multi- 
plicative functions, Ramanujan J. 49 (2019), no. 2, 287-319, DOI 10.1007/s11139-018-0119- 
3. MR3949071 

A. Granville, D. Koukoulopoulos, and K. Matomaki, When the sieve works, Duke Math. J. 
164 (2015), no. 10, 1935-1969, DOI 10.1215/00127094-3120891. MR3369306 

A. Granville and G. Martin, Prime number races, Amer. Math. Monthly 113 (2006), no. 1, 
1-33, DOI 10.2307/27641834. MR2202918 

A. Granville and K. Soundararajan, An uncertainty principle for arithmetic sequences, Ann. 
of Math. (2) 165 (2007), no. 2, 593-635, DOI 10.4007/annals.2007.165.593. MR2299742 


A. Granville and K. Soundararajan, Large character sums: pretentious characters and 
the Pédlya-Vinogradov theorem, J. Amer. Math. Soc. 20 (2007), no. 2, 357-384, DOI 
10.1090/S0894-0347-06-00536-4. MR2276774 


A. Granville and K. Soundararajan, Sieving and the Erdés-Kac theorem, Equidistribution in 
number theory, an introduction, NATO Sci. Ser. If Math. Phys. Chem., vol. 237, Springer, 
Dordrecht, 2007, pp. 15-27, DOI 10.1007/978-1-4020-5404-4 2. MR2290492 


A. Granville and K. Soundararajan, Pretentious multiplicative functions and an inequality 
for the zeta-function, Anatomy of integers, CRM Proc. Lecture Notes, vol. 46, Amer. Math. 


Soc., Providence, RI, 2008, pp. 191-197. MR2437976 


A. Granville and K. Soundararajan, Multiplicative number theory, Snowbird MRC notes 
(unpublished), 2011. 


G. Greaves, Sieves in number theory, Ergebnisse der Mathematik und ihrer Grenzgebiete 
(3) [Results in Mathematics and Related Areas (3)], vol. 43, Springer-Verlag, Berlin, 2001. 


MR1836967 


B. Green and T. Tao, The primes contain arbitrarily long arithmetic progressions, Ann. of 
Math. (2) 167 (2008), no. 2, 481-547, DOI 10.4007/annals.2008.167.481. MR2415379 


B. Green and T. Tao, Linear equations in primes, Ann. of Math. (2) 171 (2010), no. 3, 
1753-1850, DOI 10.4007/annals.2010.171.1753. MR2680398 


348 


Bibliography 


79 


80 


81 


82 


83 


84 


85 


86 


87 


88 


89 


90 


91 


92 


93 


94 


95 


[96] 
[97] 


[98] 


B. Green and T. Tao, The Mobius function is strongly orthogonal to nilsequences, Ann. of 
Math. (2) 175 (2012), no. 2, 541-566, DOI 10.4007/annals.2012.175.2.3. MR2877066 


B. Green, T. Tao, and T. Ziegler, An inverse theorem for the Gowers U®+1[N]-norm, Ann. 
of Math. (2) 176 (2012), no. 2, 1231-1372, DOI 10.4007/annals.2012.176.2.11. MR2950773 


J. Hadamard, Etude sur les propriétés des fonctions entiéres et en particulier d’une fonction 
considré par Riemann, J. Math. Pures Appl. (4) 9 (1893), 171-215. 


J. Hadamard, Sur la distribution des zéros de la fonction ¢(s) et ses conséquences 
arithmétiques (French), Bull. Soc. Math. France 24 (1896), 199-220. MR1504264 


J. Hadamard, Sur la distribution des zéros de la fonction ¢(s) et ses conséquences 
arithmétiques (French), Bull. Soc. Math. France 24 (1896), 199-220. MR1504264 


G. Haldsz, On the distribution of additive and the mean values of multiplicative arithmetic 
functions, Studia Sci. Math. Hungar. 6 (1971), 211-233. MR0319930 


D. K. Faddeyev, S. M. Lozinsky, and A. V. Malyshev, Yuri V. Linnik (1915-1972): a 
biographical note, Acta Arith. 27 (1975), 1-2, DOI 10.4064/aa-27-1-1-2. Collection of articles 


in memory of Jurii Vladimirovié Linnik. MR0421941 


H. Halberstam and H.-E. Richert, Sieve methods, London Mathematical Society Mono- 
graphs, vol. 4, Academic Press [A subsidiary of Harcourt Brace Jovanovich, Publishers], 
London-New York, 1974. MR0424730 


G. H. Hardy and J. E. Littlewood, A new solution to Waring’s problem, Q. J. Math. 48 
(1919), 272-293. 


G. H. Hardy and J. E. Littlewood, Some problems of “partitio numerorum”: I. A new 
solution to Waring’s problem, Gottingen Nachrichten, 1920, 33-54. 


G. H. Hardy and J. E. Littlewood, Some problems of “partitio numerorum”: II. Proof that 
every large number is the sum of at most 21 biquadrates, Math. Z. 9 (1921), no. 1-2, 14-27, 


DOI 10.1007/BF01378332. MR1544448 


G. H. Hardy and J. E. Littlewood, Some problems of ‘Partitio numerorum’; III: On the 
expression of a number as a sum of primes, Acta Math. 44 (1923), no. 1, 1-70, DOI 


10.1007/BF02403921. MR1555183 


G. H. Hardy and J. E. Littlewood, Some problems of ‘Partitio Numerorum’: IV. The sin- 
gular series in Waring’s Problem and the value of the number G(k), Math. Z. 12 (1922), 
no. 1, 161-188, DOI 10.1007/BF01482074. MR1544511 


G. H. Hardy and J. E. Littlewood, Some problems of ‘Partitio numerorum’: V. A further 
contribution to the study of Goldbach’s problem, Proc. London Math. Soc. (2) 22 (1924), 
46-56. 


G. H. Hardy and J. E. Littlewood, Some problems of ‘Partitio numerorum’ (VI): Further 
researches in Waring’s Problem, Math. Z. 23 (1925), no. 1, 1-37, DOI 10.1007/BF01506218. 


MR1544728 

G. H. Hardy and J. E. Littlewood, Some problems of ‘Partitio Numerorum’ (VIII): The 
number Gamma(k) in Waring’s Problem, Proc. London Math. Soc. (2) 28 (1928), no. 7, 
518-542, DOI 10.1112/plms/s2-28.1.518. MR1575871 

G. H. Hardy and S$. Ramanujan, Proof that almost all numbers n are composed of about 
loglogn prime factors [Proc. London Math. Soc. (2) 16 (1917), Records for 14 Dec. 
1916], Collected papers of Srinivasa Ramanujan, AMS Chelsea Publ., Providence, RI, 2000, 


pp. 242-243. MR2280875| 


G. Harman, Prime-detecting sieves, London Mathematical Society Monographs Series, 
vol. 33, Princeton University Press, Princeton, NJ, 2007. MR2331072 


D. R. Heath-Brown, Prime numbers in short intervals and a generalized Vaughan identity, 
Canad. J. Math. 34 (1982), no. 6, 1365-1377, DOI 10.4153/CJM-1982-095-9. MR678676 


D. R. Heath-Brown, Primes represented by x? + 2y?, Acta Math. 186 (2001), no. 1, 1-84, 


DOI 10.1007/BF02392715. MR1828372 


Bibliography 349 


[99 
100 


101 


102 


103 


104 


105 


106 


107 


108 


109 


110 


111 


112 


113 


114 


115 


116 


117 


118 


119 


D. R. Heath-Brown and X. Li, Prime values of a? + p*, Invent. Math. 208 (2017), no. 2, 
441-499, DOI 10.1007 /s00222-016-0694-0. MR3639597 


D. R. Heath-Brown and B. Z. Moroz, Primes represented by binary cubic forms, Proc. 
London Math. Soc. (3) 84 (2002), no. 2, 257-288, DOI 10.1112/plms/84.2.257. MR1881392 


D. R. Heath-Brown and B. Z. Moroz, On the representation of primes by cubic poly- 
nomials in two variables, Proc. London Math. Soc. (3) 88 (2004), no. 2, 289-312, DOI 


10.1112/S0024611503014497. MR2032509 


A. Hildebrand, Integers free of large prime factors and the Riemann hypothesis, Mathe- 
matika 31 (1984), no. 2, 258-271 (1985), DOI 10.1112/S0025579300012481. MR804201 


A. Hildebrand and G. Tenenbaum, On integers free of large prime factors, Trans. Amer. 
Math. Soc. 296 (1986), no. 1, 265-290, DOI 10.2307/2000573. MR837811 


A. Hildebrand and G. Tenenbaum, Integers without large prime factors, J. Théor. Nombres 
Bordeaux 5 (1993), no. 2, 411-484. MR1265913 


C. Hooley, On the Brun-Titchmarsh theorem, J. Reine Angew. Math. 255 (1972), 60-79, 
DOI 10.1515 /crll.1972.255.60. MR0304328 

C. Hooley, On the Brun-Titchmarsh theorem. II, Proc. London Math. Soc. (3) 30 (1975), 
114-128, DOI 10.1112/plms/s3-30.1.114. MR0369296 


C. Hooley, Applications of sieve methods to the theory of numbers, Cambridge Tracts in 
Mathematics, vol. 70, Cambridge University Press, Cambridge-New York-Melbourne, 1976. 


MR0404173 
C. Hooley, On an almost pure sieve, Acta Arith. 66 (1994), no. 4, 359-368, DOI 10.4064/aa- 
66-4-359-368. MR1288352 


M. N. Huxley, Area, lattice points, and exponential sums, London Mathematical Society 
Monographs. New Series, vol. 13, The Clarendon Press, Oxford University Press, New York, 
1996. Oxford Science Publications. MR1420620 


E. K. Ifantis and P. D. Siafarikas, A differential equation for the zeros of Bessel functions, 
Applicable Anal. 20 (1985), no. 3-4, 269-281, DOI 10.1080/00036818508839574. MR814954 


A. E. Ingham, The distribution of prime numbers, Cambridge Mathematical Library, Cam- 
bridge University Press, Cambridge, 1990. Reprint of the 1932 original; With a foreword by 


R. C. Vaughan. MR1074573 

H. Iwaniec, Rosser’s sieve, Acta Arith. 36 (1980), no. 2, 171-202, DOI 10.4064/aa-36-2-171- 
202. MR581917 

H. Iwaniec, A new form of the error term in the linear sieve, Acta Arith. 37 (1980), 307-320, 
DOI 10.4064/aa-37-1-307-320. MR598883 


H. Iwaniec and E. Kowalski, Analytic number theory, American Mathematical Society 
Colloquium Publications, vol. 53, American Mathematical Society, Providence, RI, 2004. 


MR2061214 
W. B. Jurkat and H.-E. Richert, An improvement of Selberg’s sieve method. I, Acta Arith. 
11 (1965), 217-240, DOI 10.4064/aa-11-2-217-240. MR0202680 


M. Kac, Statistical independence in probability, analysis and number theory, The Carus 
Mathematical Monographs, No. 12, Published by the Mathematical Association of America. 
Distributed by John Wiley and Sons, Inc., New York, 1959. MR0110114 


A. Khintchine, Uber das Gesetz der grofen Zahlen (German), Math. Ann. 96 (1927), no. 1, 
152-168, DOI 10.1007/BF01209158. MR1512310 


N. M. Korobov, Weyl’s estimates of sums and the distribution of primes (Russian), Dokl. 
Akad. Nauk SSSR 123 (1958), 28-31. MR0103862 


D. Koukoulopoulos, Pretentious multiplicative functions and the prime number theo- 
rem for arithmetic progressions, Compos. Math. 149 (2013), no. 7, 1129-1149, DOI 


10.1112/S0010437X12000802. MR3078641 


350 


Bibliography 


120 


121 


122 


123 


124 


125 


126 


127 


128 


129 


130 


131 


132 


133 


134 


135 


136 


137 


138 


139 


140 


141 


142 


D. Koukoulopoulos, On multiplicative functions which are small on average, Geom. Funct. 
Anal. 23 (2013), no. 5, 1569-1630, DOI 10.1007/s00039-013-0235-6. MR3102913 


E. Kowalski, Gaps between prime numbers and prime numbers in arithmetic progressions, 
after Y. Zhang and J. Maynard, Survey (Bourbaki seminar, March 2014). 


J. Kubilius, Probabilistic methods in the theory of numbers, Translations of Mathematical 
Monographs, Vol. 11, American Mathematical Society, Providence, R.I., 1964. MR0160745 


Y. Lamzouri and A. P. Mangerel, Large odd order character sums and improvements of the 
Polya- Vinogradov inequality. Preprint (2017), 34 pages, arXiv:1701.01042. 

E. Landau, Uber den Zusammenhang einiger neuer Sétze der analytischén Zahlentheorie, 
Wiener Sitzungberichte, Math. Klasse 115 (1906), 589-632. 


E. Landau, Neuer Beweis des Primzahlsatzes und Beweis des Primidealsatzes (German), 
Math. Ann. 56 (1903), no. 4, 645-670, DOI 10.1007/BF01444310. MR1511191 


E. Landau, Handbuch der Lehre von der Verteilung der Primzahlen (German), Teubner, 
Leipzig-Berlin, 1909. 

E. Landau, Losung des Lehmer’schen Problems (German), Amer. J. Math. 31 (1909), no. 1, 
86-102, DOI 10.2307/2370180. MR1506062 


E. Landau, Uber die Wurzeln der Zetafunktion (German), Math. Z. 20 (1924), no. 1, 98-104, 
DOI 10.1007/BF01188073. MR1544664 


E. Landau, Uber die Einteilung der positiven ganzen Zahlen in vier Klassen nach der 
Mindestzahl der zu threr additiven Zusammensetzung erforderlichen Quadrate, Arch. Math. 
Phys. (3) 13 (1908), 305-312; Collected Works, Vol. 4, Essen:Thales Verlag, 1986, pp. 59-66. 


A. F. Lavrik, The approximate functional equation for Dirichlet L-functions (Russian), 
Trudy Moskov. Mat. Ob8é. 18 (1968), 91-104. MR0236126 

U. V. Linnik, “The large sieve”, C. R. (Doklady) Acad. Sci. URSS (N.S.) 30 (1941), 292-294. 
MR0004266 


U. V. Linnik, On the least prime in an arithmetic progression. I. The basic theorem 
(English, with Russian summary), Rec. Math. [Mat. Sbornik] N.S. 15(57) (1944), 139-178. 


MR0012111 

U. V. Linnik, On the least prime in an arithmetic progression. II. The Deuring-Heilbronn 
phenomenon (English, with Russian summary), Rec. Math. [Mat. Sbornik] N.S. 15(57) 
(1944), 347-368. MR0012112 

H. Maier, Primes in short intervals, Michigan Math. J. 32 (1985), no. 2, 221-225, DOI 
10.1307/mmj/1029003189. MR,783576 


H. Maier and C. Pomerance, Unusually large gaps between consecutive primes, Trans. Amer. 
Math. Soc. 322 (1990), no. 1, 201-237, DOI 10.2307/2001529. MR972703 

H. von Mangoldt, Zu Riemanns Abhandlung “Ueber die Anzahl der Primzahlen unter 
einer gegebenen Grdsse” (German), J. Reine Angew. Math. 114 (1895), 255-305, DOI 
10.1515/crll.1895.114.255. MR1580379 

D. A. Marcus, Number fields, Universitext, Springer, Cham, 2018. Second edition of 
[MR0457396]; With a foreword by Barry Mazur. MR3822326 

J. Maynard, Small gaps between primes, Ann. of Math. (2) 181 (2015), no. 1, 383-413, DOI 
10.4007 /annals.2015.181.1.7. MR3272929 

J. Maynard, Large gaps between primes, Ann. of Math. (2) 183 (2016), no. 3, 915-933, DOI 
10.4007 /annals.2016.183.3.3. MR3488739 

J. Maynard, Dense clusters of primes in subsets, Compos. Math. 152 (2016), no. 7, 1517— 
1554, DOI 10.1112/S0010437X16007296. MR3530450 


J. Maynard, Primes represented by incomplete norm forms, Preprint (2015), 56 pages, 
arXiv: 1507.05080. 


F. Mertens, Hin Beitrag zur analytischen Zahlentheorie (German), J. Reine Angew. Math. 
78 (1874), 46-62, DOI 10.1515/crll.1874.78.46. MR1579612 


Bibliography 351 


[143] 


[144] 


145 


146 


147 


148 


149 


150 


151 


152 


153 


154 


155 


156 


157 


158 


159 


160 


161 


162 


H. L. Montgomery, Problems concerning prime numbers, Mathematical developments aris- 
ing from Hilbert problems (Proc. Sympos. Pure Math., Northern Illinois Univ., De Kalb, 
Ill., 1974), Amer. Math. Soc., Providence, R. I., 1976, pp. 307-310. MR0427249 


H. L. Montgomery, Ten lectures on the interface between analytic number theory and har- 
monic analysis, CBMS Regional Conference Series in Mathematics, vol. 84, Published for 
the Conference Board of the Mathematical Sciences, Washington, DC; by the American 
Mathematical Society, Providence, RI, 1994. MR1297543 


H. L. Montgomery and R. C. Vaughan, The large sieve, Mathematika 20 (1973), 119-134, 
DOI 10.1112/50025579300004708. MR0374060 


H. L. Montgomery and R. C. Vaughan, Multiplicative number theory. I. Classical theory, 
Cambridge Studies in Advanced Mathematics, vol. 97, Cambridge University Press, Cam- 


bridge, 2007. MR2378655 

M. Nair, On Chebyshev-type inequalities for primes, Amer. Math. Monthly 89 (1982), no. 2, 
126-129, DOI 10.2307/2320934. MR643279 

J. Neukirch, Algebraic number theory, Grundlehren der Mathematischen Wissenschaften 
[Fundamental Principles of Mathematical Sciences], vol. 322, Springer-Verlag, Berlin, 1999. 
Translated from the 1992 German original and with a note by Norbert Schappacher; With 
a foreword by G. Harder. MR1697859 

R. E. A. C. Paley, A theorem on characters, J. London Math. Soc. 7 (1932), no. 1, 28-32, 
DOI 10.1112/jlms/s1-7.1.28. MR1574456 

J. Pintz, Very large gaps between consecutive primes, J. Number Theory 63 (1997), no. 2, 


286-301, DOI 10.1006/jnth.1997.2081. MR1443763 


D. H. J. Polymath, Variants of the Selberg sieve, and bounded intervals containing many 
primes, Res. Math. Sci. 1 (2014), Art. 12, 83, DOI 10.1186/s40687-014-0012-7. MR3373710 


R. A. Rankin, The difference between consecutive prime numbers, J. London Math. Soc. 11 


(1936), no. 4, 242-245, DOI 10.1112/jlms/s1-13.4.242. MR1574971 


R. A. Rankin, The difference between consecutive prime numbers. V, Proc. Edinburgh Math. 
Soc. (2) 13 (1962/1963), 331-332, DOI 10.1017/S0013091500025633. MR0160767 


B. Riemann, Uber die Anzahl der Primzahlen unter einer gegebenen Grésse, Monatsberichte 
der Berliner Akademie. In Gesammelte Werke, Teubner, Leipzig (1892), Reprinted by Dover, 
New York (1953). Original manuscript (with English translation). Reprinted in (Borwein et 
al. 2008) and (Edwards 1974). 


B. Riemann, Unpublished papers, Handschriftenabteilung Niedersachische Staatsund Uni- 
versitatsbibliotek, Gottingen. 


G. Rodriquez, Sul problema dei divisori di Titchmarsh (Italian, with English summary), 
Boll. Un. Mat. Ital. (3) 20 (1965), 358-366. MR0197409 


M. Rubinstein and P. Sarnak, Chebyshev’s bias, Experiment. Math. 3 (1994), no. 3, 173-197. 
MR1329368 
W. Rudin, Principles of mathematical analysis, 3rd ed., International Series in Pure 
and Applied Mathematics, McGraw-Hill Book Co., New York-Auckland-Diisseldorf, 1976. 
MR0385023 


W. Rudin, Real and complex analysis, 3rd ed., McGraw-Hill Book Co., New York, 1987. 
MR924157 


L. G. Sathe, On a problem of Hardy on the distribution of integers having a given number 
of prime factors. II, J. Indian Math. Soc. (N.S.) 17 (1953), 83-141. MR0058632 


A. Selberg, On the normal density of primes in small intervals, and the difference between 
consecutive primes, Arch. Math. Naturvid. 47 (1943), no. 6, 87-105. MR12624 


A. Selberg, Note on a paper by L. G. Sathe, J. Indian Math. Soc. (N.S.) 18 (1954), 83-87. 
MR0067143 


352 


Bibliography 


163 


164 


165 


166 


167 


168 


169 


170 


171 


172 


173 


174 


175 


176 


177 


178 


179 


180 


181 


182 


183 


A. Selberg, Collected papers. Vol. II, Springer-Verlag, Berlin, 1991. With a foreword by K. 
Chandrasekharan. MR1295844 


C. L. Siegel, Uber Riemanns Nachlass zur analytischen Zahlentheorie, Quellen Studien zur 
Geschichte der Math. Astron. und Phys. Abt. B: Studien 2 (1932), 45-80. 


P. Shiu, A Brun-Titchmarsh theorem for multiplicative functions, J. Reine Angew. Math. 
313 (1980), 161-170, DOI 10.1515/crll.1980.313.161. MR552470 


K. Soundararajan, Small gaps between prime numbers: the work of Goldston-Pintz- Yulderwm, 
Bull. Amer. Math. Soc. (N.S.) 44 (2007), no. 1, 1-18, DOI 10.1090/S0273-0979-06-01142-6. 


MR2265008 


E. M. Stein and R. Shakarchi, Real analysis: Measure theory, integration, and Hilbert spaces, 
Princeton Lectures in Analysis, vol. 3, Princeton University Press, Princeton, NJ, 2005. 


MR2129625 


D. W. Stroock, Probability theory: An analytic view, 2nd ed., Cambridge University Press, 


Cambridge, 2011. MR2760872 

E. Szemerédi, On sets of integers containing no k elements in arithmetic progression, Acta 
Arith. 27 (1975), 199-245, DOI 10.4064/aa-27-1-199-245. Collection of articles in memory 
of Jurii Vladimirovié Linnik. MR0369312 


T. Tao, The parity problem is sieve methods, blog post (2007). URL: https://terrytao. 
wordpress .com/2007/06/05/open-question-the-parity-problem-in-sieve-theory/ 

T. Tao, Polymath&8b: Bounded intervals with many primes, after Maynard, blog 
post (2013). URL: https://terrytao.wordpress.com/2013/11/19/polymath8b-bounded- 
intervals-with-many-primes-after-maynard/ 

G. Tenenbaum, Introduction to analytic and probabilistic number theory, 3rd ed., Graduate 
Studies in Mathematics, vol. 163, American Mathematical Society, Providence, RI, 2015. 
Translated from the 2008 French edition by Patrick D. F. Ion. MR3363366 

E. C. Titchmarsh, The theory of functions, Oxford University Press, Oxford, 1958. Reprint 
of the second (1939) edition. MR3155290 

E. C. Titchmarsh, The theory of the Riemann zeta-function, 2nd ed., The Clarendon Press, 
Oxford University Press, New York, 1986. Edited and with a preface by D. R. Heath-Brown. 
MR882550 

C. de la Vallée Poussin, Recherches analytiques sur la théorie des nombres premiers, I-III, 
Ann. Soc. Sci. Bruxelles 20 (1896), 183-256, 281-362, 363-397. 


A. I. Vinogradov, The density hypothesis for Dirichet L-series (Russian), Izv. Akad. Nauk 


SSSR Ser. Mat. 29 (1965), 903-934. MR0197414 


I. M. Vinogradov, Representation of an odd number as a sum of three primes, C. R. Acad. 
Sci. URSS 15 (1937), 6-7. 


I. M. Vinogradov, Simplest trigonometrical sums with primes, C. R. (Doklady) Acad. Sci. 


URSS (N.S.) 23 (1939), 615-617. MR0001763 

I. M. Vinogradov, On the estimations of some simplest trigonometrical sums involving 
prime numbers (Russian), Bull. Acad. Sci. URSS. Sér Math. [Izvestia Akad. Nauk SSSR] 
(1939), 371-398. 

I. M. Vinogradov, The method of trigonometrical sums in the theory of numbers (Russian), 
Trav. Inst. Math. Stekloff 23 (1947), 109 pp. 


I. M. Vinogradov, A new estimate of the function ¢(1 + it) (Russian), Izv. Akad. Nauk 


SSSR. Ser. Mat. 22 (1958), 161-164. MR0103861 


A. Walfisz, Weylsche Exponentialsummen in der neueren Zahlentheorie (German), Mathe- 
matische Forschungsberichte, XV, VEB Deutscher Verlag der Wissenschaften, Berlin, 1963. 
MR0220685 


E. W. Weisstein, Bessel function of the first kind, from MathWorld—A Wolfram Web Re- 
source. http: //mathworld.wolfram.com/BesselFunctionoftheFirstKind.html 


Bibliography 303 


184 


185 


186 


187 


188 


E. W. Weisstein, Brun’s constant, from MathWorld—A Wolfram Web Resource. http:// 
mathworld.wolfram.com/BrunsConstant .html 


E. Westzynthius, Uber die Verteilung der Zahlen, die zu denn ersten Primzahlen teilerfremd 
sind, Comm. Phys. Math. Soc. Sci. Fenn. 25 (1931). 


E. Wirsing, Das asymptotische Verhalten von Summen tiber multiplikative Funktionen 
(German), Math. Ann. 148 (1961), 75-102, DOI 10.1007/BF01351892. MR0131389 


E. Wirsing, Das asymptotische Verhalten von Summen tiber multiplikative Funktionen. IT 
(German), Acta Math. Acad. Sci. Hungar. 18 (1967), 411-467, DOI 10.1007/BF02280301. 
MR0223318 

Y. Zhang, Bounded gaps between primes, Ann. of Math. (2) 179 (2014), no. 3, 1121-1174, 


DOI 10.4007 /annals.2014.179.3.7. MR3171761 


Index 


6-spaced mod 1, 
3-4-1 inequality, 


Abel’s summation formula, [13] 
abscissa of 
absolute convergence, 
convergence, 
absolute constant, [8] 
additive character, {102 
additive Fourier transform, [102] 
additive function, 
admissible tuple, [[81] 
anatomy of integers, [165 
arithmetic function, 
Axiom 1 of sieve theory, [185] 
Axiom 2 of sieve theory, [187 
Axiom 2’ of sieve theory, [188] 
Axiom 3 of sieve theory, [188 


Bernoulli number, 

Bernoulli polynomial, 23] 

Bernoulli random variable, 

beta sieve, 

bilinear form, [260] 

bilinear sum, 

Bombieri- Vinogradov theorem, [189] [277] 
Bonferonni inequalities, 
Borel-Carathéodory theorem, [89] 
Brun’s constant, 

Brun’s pure sieve, [176] [194] 

Brun’s sieve, [194] 

Brun-Hooley sieve, [203 
Brun-Titchmarsch inequality, 206] [219] 


Buchstab’s function, [150] 
Buchstab’s identity, 


character lift, [103] 

character of an abelian group, [100] 
Chebotarev Density Theorem, |187 
Chebyshev’s bias, [116] 

Chebyshev’s estimate, [32] 

Chebyshev’s psi function, [22] [56] 
Chebyshev’s theta function, [13] [56] 
Chen’s theorem, 

Chernoff’s inequality, [164] 

circle method, 

combinatorial sieve, |194 

completed L-function, [I1]] 

completely multiplicative function, 
conductor of a character, [99] [103] 
covering system of congruences, [319 
Cramér’s model, 3} [301] [317] [330] 
Cramér’s theory of large deviations, [164 
Cramér-Granville model, 
critical line, [64] [112] 

critical strip, 


delay differential equation, 
Dickman-de Bruijn function, [152] [169] 
Dirichlet L-function, 
analytic continuation, [111] 
approximate functional equation, |117 
Euler product, 
exceptional character, 229] 
exceptional zero, 
explicit formula, [98] [14] 


304 


Index 


350 


functional equation, [111 

Hadamard product, 

non-trivial zero, 

root number, [111] 

trivial zero, [112 

zero-free region, 
Dirichlet character, [96] 
Dirichlet convolution, 
Dirichlet inverse, 
Dirichlet series, 
Dirichlet’s hyperbola method, 9] 
distance of multiplicative functions, [87] 
divisor function, 
divisor-bounded function, [131 
duality principle, 266] 


Elliott-Halberstam conjecture, [189 

Erdés-Kac theorem, [159 

Euler product, 

Euler-Maclaurin summation formula, 
(12) (15) 

Euler-Mascheroni constant, 

even character, {111 

exceptional character, [123 

exponential sum, |241 


faithful character, 
Farey fraction, [260 
Fejér kernel, 
Fourier inversion, 

in finite abelian groups, [101 
Fourier transform 

mod q, [102] 

of sequences, [241 

on the real line, 
fractional part, [9] 
Fundamental Lemma of Sieve Theory, 


Gamma function, 
duplication formula, 
functional equation, 
reflection formula, 

Gauss sum, | 103 

Generalized Riemann Hypothesis, |108 

112 

Goldbach’s conjecture 
binary, 
ternary, [250] 

GPY sieve, [301] 

Green-Tao theorem, [4] [250} [B19] 


Hankel contour, [140 

Hankel’s formula, [132] 

Hardy’s function, 
Hardy-Littlewood conjecture, 
Hardy-Ramanujan theorem, [163] 
hyperbola method, 


imprimitive character, {103 
induced character, [103] 
ineffective constant, 
integer part, 

Iwaniec condition, [187] 


Jensen’s formula, [90] 
Kubilius model, 


Landau-Siegel zero, [119] [123] [218] [220} 
231 
Laplace transform, |164 
large sieve, [267] 
additive version, 
arithmetic version, |271 
multiplicative version, 270] 
least quadratic non-residue, [276] 
least quadratic nonresidue, 274] 
level of distribution, [188] 
Lindelof hypothesis, 
Linnik’s theorem, |287) 
logarithmic integral, fi] 
lower bound sieve, 
LSD method, [132] 


Mobius function, 5] 

Mobius inversion formula, 

Maier matrix, [3830 

major arc, [252] 

Markov’s inequality, 

Maynard-Tao weights, [802 

Mellin inversion, [54] [B40] 

Mellin transform, 

Mertens’ estimates, 9] 

method of moments, [159] [341] 

minor arcs, |252 

monotonicity principle of sieve weights, 
[202] [219] 

Montgomery’s conjecture, 

multiplicative character, {102 

multiplicative function, [28] 


non-principal character, |100 
norm of a bilinear form, [261 


odd character, [11] 


356 


Index 


Pélya-Vinogradov inequality, [106] 
Page’s theorem, [124 
parity problem of sieve methods, 221] 
Parseval’s identity, 

for finite abelian groups, [101 
partial summation, 
Perron inversion formula, 
Phragmén-Lindelof principle, 
Poisson summation formula, 

for Dirichlet characters, [105] 
Polignac’s conjecture, [174] [300] 
pretentious large sieve, [288] [292] 
pretentious multiplicative functions, 
Prime Number Theorem, 

for arithmetic progressions, 
primitive character, |103 
principal character, 
principal value, 


quasi-linear sum, [236] 
quasi-smooth function, [236] 
quasi-smooth sum, [236] 


Rankin’s trick, 
Riemann Hypothesis, 
Riemann zeta function, [2] 
approximate functional equation, 
Euler product, 
explicit formula, 
functional equation, 
Hadamard product, 
meromorphic continuation, 
non-trivial zero, 
trivial zero, 
zero-free region, 
Riemann-Siegel formula, 
Riemann-Stieltjes integral, 
root number, [111] 
rough number, [149] 


saddle-point method, 

Selberg’s sieve, [213 

Shiu’s theorem, |209 

Siegel’s theorem, [127] 

Siegel-Walfisz theorem, [118 

sieve of Eratosthenes, 

sieve of Eratosthenes-Legendre, 
175 

sifting dimension, |187 

sifting limit, [200] 

smooth number, [xi] [152] [169] 


square-full integer, 
stationary point, [17] [165] 
Stirling’s formula, 
subconvexity estimate, 
summation by parts, [13] [14] 
summatory function, 
Szemerédi’s theorem, [251 


Titchmarsch-Linnik divisor problem, 
207 
totient function, 
transference principle, [251] 
twin prime, 
conjecture, 


constant, [180] 
type I function, [236 


type I sum, |236 


type II function, [236] 
type II sum, [236 


upper bound sieve, 


Vaughan’s identity, [237 
Vinogradov’s conjecture, [274 
von Mangoldt’s function, 


Wirsing’s theorem, [147 


Prime numbers have fascinated mathematicians since the 
time of Euclid. This book presents some of our best tools to 
capture the properties of these fundamental objects, begin- 
ning with the most basic notions of asymptotic estimates and 
arriving at the forefront of mathematical research. Detailed 
proofs of the recent spectacular advances on small and large 
gaps between primes are made accessible for the first time 
in textbook form. Some other highlights include an introduc- 
tion to probabilistic methods, a detailed study of sieves, and 
elements of the theory of pretentious multiplicative functions 
leading to a proof of Linnik’s theorem. 


Magenta Photo Inc. 


Throughout, the emphasis has been placed on explaining the main ideas 
rather than the most general results available. As a result, several methods 
are presented in terms of concrete examples that simplify technical details, 
and theorems are stated in a form that facilitates the understanding of their 
proof at the cost of sacrificing some generality. Each chapter concludes with 
numerous exercises of various levels of difficulty aimed to exemplify the 
material, as well as to expose the readers to more advanced topics and point 
them to further reading sources. 


For additional information 
ISBN 978-1-4704-4754-0 


and updates on this book, visit 
| | www.ams.org/bookpages/gsm-203 
9°78 7540 


1470°44 


GSM/203 aif AM S 


**s. WWW.ams.org 


